Aws Big Data Engineer Resume
4.00/5 (Submit Your Rating)
Souderton, PA
SUMMARY
- Over 10 years of IT experience as a Developer, Designer & QA Engineer with cross - platform integration experience using Hadoop Ecosystem, Java and functional automation
- Hands on experience in installing, configuring and architecting Hadoop and Hortonworks clusters and services - HDFS, MapReduce, Yarn, Pig, Hive, Hbase, Spark, Sqoop, Flume and Oozie
- 2+ years of experience in Cloud platform (AWS).
- 2+ Years of Experience on working using Spark Technology.
- Expertise on Spark streaming (Lambda Architecture), Spark SQL, Tuning and Debugging the Spark Cluster (MESOS).
- Expertise on working with Machine Learning with MLlib using Python.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, Redshift, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
- Selecting appropriate AWS services to design and deploy an application based on given requirements.
- Setup/Managing CDN on Amazon Cloud Front to improve site performance.
- Expertise on working with MongoDB, Apache Cassandra.
- Expertise on Java, J2EE, Java Scripting, HTML, JSP.
- Solid programming knowledge on Scala, Python, C#, Ruby
- Hands on experience with integrating Rest API's to cloud environment to access resources.
- Experience in working with Teradata. And making the data to be batch processing using distributed computing.
- Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and also involved in extracting the data from these tools on to the cluster using Sqoop.
- Developed Oozie workflow schedulers to run multiple Hive and Pig jobs that run independently with time and data availability.
- Good knowledge of High-Availability, Fault Tolerance, Scalability, Database Concepts, System and Software Architecture, Security and IT Infrastructure.
- Lead onshore & offshore service delivery functions to ensure end-to-end ownership of incidents and service requests.
- Getting in touch with the Junior developers and keeping them updated with the present cutting Edge technologies like Hadoop, Spark, SparkSQL, Presto
- All the projects which I have worked for are Open Source Projects and has been tracked using JIRA.
- Experience on agile methodologies Scrum.
TECHNICAL SKILLS
- Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, OozieFlume, Yarn, HBase, Spark with Scala
- No SQL Databases: HBase, Cassandra, Mongo DB
- Scripting Languages: Cassandra, Python, Scala, Ruby, Bash, UNIX shell and JavaScript
- Programming Languages: Java, SQL, C#, HTML5, CSS3
- Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL
- Frameworks: MVC, Struts, Spring, Hibernate
- Operating Systems: Linux, Unix, Mac, Windows
- Web Technologies: HTML, DHTML, XML
- Web/Application servers: Apache Tomcat, WebLogic, JBoss
- Databases: Data Warehouse, SQL Server, MySQL, MongoDB, Oracle
- IDE: Eclipse, IntelliJ IDEA
PROFESSIONAL EXPERIENCE
AWS BIG DATA ENGINEER
Confidential, Souderton, PA
Responsibilities:
- Created S3 buckets, and enforced policies on the IAM roles and customized the JSON template
- Launched Amazon EC2 Cloud Instances using AWS (Linux/Ubuntu) and configured launched instances
- Managed Amazon Redshift clusters such as launching the cluster and specifying the node types
- Used AWS BeanStalk for deploying and scaling web applications and services developed with Java
- End to end deployment ownership for projects on AWS. including Python scripting for automation, scalability, builds promotions for staging to production etc.
- Hands on with Git / GitHub for code check-ins/checkouts and branching etc.
- Implemented and maintained the monitoring and alerting of production and enterprise servers / storage using AWS Cloud Watch
- Built Continuous Integration environments using Jenkins and puppet.
- Experienced in various AWS services including VPC, EC2, S3, RDS, Redshift, Dynamo DB, Lambda, SNS and SQS.
- Designed and deployed multiple applications utilizing almost all the AWS stack including EC2, Route53, S3, RDS, DynamoDB, SNS, SQS, IAM focusing on high-availability, fault tolerance, and auto- scaling in AWS Cloud Formation.
- Experienced in installation, configuration and troubleshooting the issues and performance tuning of WebLogic/Apache/IIS and Tomcat.
- Written shell scripts for end to end build and deployment automation. Run Ansible Scripts to provision Dev servers.
- Created Docker container using Docker images to test the application even ship and run applications.
- Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts.
Environment: Python, UNIX, VMware, Shell, Perl, IAM, S3, EBS, EC2, Cloud Watch, Cloud Formation, Puppet, Docker, Jenkins, Spark, Kafka.
CLOUD BIG DATA ENGINEER
Confidential, New York, NY
Responsibilities:
- Worked on installing Kafka on Virtual Machine and created topics for different users
- Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
- Develop SSL security layers and setup ACL/SSL security for users and assigned multiple topics
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- Worked on migrating multiple applications and automating the infrastructure creation using CloudFormation for the new applications.
- Used AWS Application Discovery tools to do the Analysis of the existing infrastructure.
- Designing and implementation of Public and private facing websites on Amazon web services.
- Migrated the Application on to AWS Cloud .
- Created and configured Redshift clusters.
- Configured EMR Cluster, used Hive script to process the data stored in S3
- Created Data-pipelines and configured EMR Cluster to offload the data to Redshift.
- Configured Data-pipelines (with EMR Cluster) to offload the data to Redshift.
- Infrastructure as Code: Automated the infrastructure creation using AWS Cloud Formation.
- Responsible for security including opening different ports on security groups, Network ACL, building Peering connections, NAT instances & VPN connection.
- Written various Lambda services using Python and Java for automating some of the tasks.
- Used SSM Commands to run the shell script on Linux instances for server startup etc. and invoked the run command using Lambda.
Environment: EC2, Load balancing, Auto Scaling, Route53, VPC, IAM, RDS, Cloud Formation, puppet, Spark, Hive, Kafka.
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Developed NiFi workflows to automate the data movement between different Hadoop systems.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Imported large datasets from DB2 to Hive Table using Sqoop
- Implemented Apache PIG scripts to load data from and to store data into Hive.
- Partitioned and bucketed Hive tables and compressed data with Snappy to load data into Parquet hive tables from Avro hive tables
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requirements of customer facing API
- Worked on Batch processing and real-time data processing on Spark Streaming using Lambda architecture
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive
- Utilized Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of using MapReduce in Java
- Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn.