We provide IT Staff Augmentation Services!

Aws Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Souderton, PA

SUMMARY

  • Over 10 years of IT experience as a Developer, Designer & QA Engineer with cross - platform integration experience using Hadoop Ecosystem, Java and functional automation
  • Hands on experience in installing, configuring and architecting Hadoop and Hortonworks clusters and services - HDFS, MapReduce, Yarn, Pig, Hive, Hbase, Spark, Sqoop, Flume and Oozie
  • 2+ years of experience in Cloud platform (AWS).
  • 2+ Years of Experience on working using Spark Technology.
  • Expertise on Spark streaming (Lambda Architecture), Spark SQL, Tuning and Debugging the Spark Cluster (MESOS).
  • Expertise on working with Machine Learning with MLlib using Python.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, Redshift, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Selecting appropriate AWS services to design and deploy an application based on given requirements.
  • Setup/Managing CDN on Amazon Cloud Front to improve site performance.
  • Expertise on working with MongoDB, Apache Cassandra.
  • Expertise on Java, J2EE, Java Scripting, HTML, JSP.
  • Solid programming knowledge on Scala, Python, C#, Ruby
  • Hands on experience with integrating Rest API's to cloud environment to access resources.
  • Experience in working with Teradata. And making the data to be batch processing using distributed computing.
  • Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and also involved in extracting the data from these tools on to the cluster using Sqoop.
  • Developed Oozie workflow schedulers to run multiple Hive and Pig jobs that run independently with time and data availability.
  • Good knowledge of High-Availability, Fault Tolerance, Scalability, Database Concepts, System and Software Architecture, Security and IT Infrastructure.
  • Lead onshore & offshore service delivery functions to ensure end-to-end ownership of incidents and service requests.
  • Getting in touch with the Junior developers and keeping them updated with the present cutting Edge technologies like Hadoop, Spark, SparkSQL, Presto
  • All the projects which I have worked for are Open Source Projects and has been tracked using JIRA.
  • Experience on agile methodologies Scrum.

TECHNICAL SKILLS

  • Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, OozieFlume, Yarn, HBase, Spark with Scala
  • No SQL Databases: HBase, Cassandra, Mongo DB
  • Scripting Languages: Cassandra, Python, Scala, Ruby, Bash, UNIX shell and JavaScript
  • Programming Languages: Java, SQL, C#, HTML5, CSS3
  • Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL
  • Frameworks: MVC, Struts, Spring, Hibernate
  • Operating Systems: Linux, Unix, Mac, Windows
  • Web Technologies: HTML, DHTML, XML
  • Web/Application servers: Apache Tomcat, WebLogic, JBoss
  • Databases: Data Warehouse, SQL Server, MySQL, MongoDB, Oracle
  • IDE: Eclipse, IntelliJ IDEA

PROFESSIONAL EXPERIENCE

AWS BIG DATA ENGINEER

Confidential, Souderton, PA

Responsibilities:

  • Created S3 buckets, and enforced policies on the IAM roles and customized the JSON template
  • Launched Amazon EC2 Cloud Instances using AWS (Linux/Ubuntu) and configured launched instances
  • Managed Amazon Redshift clusters such as launching the cluster and specifying the node types
  • Used AWS BeanStalk for deploying and scaling web applications and services developed with Java
  • End to end deployment ownership for projects on AWS. including Python scripting for automation, scalability, builds promotions for staging to production etc.
  • Hands on with Git / GitHub for code check-ins/checkouts and branching etc.
  • Implemented and maintained the monitoring and alerting of production and enterprise servers / storage using AWS Cloud Watch
  • Built Continuous Integration environments using Jenkins and puppet.
  • Experienced in various AWS services including VPC, EC2, S3, RDS, Redshift, Dynamo DB, Lambda, SNS and SQS.
  • Designed and deployed multiple applications utilizing almost all the AWS stack including EC2, Route53, S3, RDS, DynamoDB, SNS, SQS, IAM focusing on high-availability, fault tolerance, and auto- scaling in AWS Cloud Formation.
  • Experienced in installation, configuration and troubleshooting the issues and performance tuning of WebLogic/Apache/IIS and Tomcat.
  • Written shell scripts for end to end build and deployment automation. Run Ansible Scripts to provision Dev servers.
  • Created Docker container using Docker images to test the application even ship and run applications.
  • Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts.

Environment: Python, UNIX, VMware, Shell, Perl, IAM, S3, EBS, EC2, Cloud Watch, Cloud Formation, Puppet, Docker, Jenkins, Spark, Kafka.

CLOUD BIG DATA ENGINEER

Confidential, New York, NY

Responsibilities:

  • Worked on installing Kafka on Virtual Machine and created topics for different users
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
  • Develop SSL security layers and setup ACL/SSL security for users and assigned multiple topics
  • Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
  • Worked on migrating multiple applications and automating the infrastructure creation using CloudFormation for the new applications.
  • Used AWS Application Discovery tools to do the Analysis of the existing infrastructure.
  • Designing and implementation of Public and private facing websites on Amazon web services.
  • Migrated the Application on to AWS Cloud .
  • Created and configured Redshift clusters.
  • Configured EMR Cluster, used Hive script to process the data stored in S3
  • Created Data-pipelines and configured EMR Cluster to offload the data to Redshift.
  • Configured Data-pipelines (with EMR Cluster) to offload the data to Redshift.
  • Infrastructure as Code: Automated the infrastructure creation using AWS Cloud Formation.
  • Responsible for security including opening different ports on security groups, Network ACL, building Peering connections, NAT instances & VPN connection.
  • Written various Lambda services using Python and Java for automating some of the tasks.
  • Used SSM Commands to run the shell script on Linux instances for server startup etc. and invoked the run command using Lambda.

Environment: EC2, Load balancing, Auto Scaling, Route53, VPC, IAM, RDS, Cloud Formation, puppet, Spark, Hive, Kafka.

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Developed NiFi workflows to automate the data movement between different Hadoop systems.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Imported large datasets from DB2 to Hive Table using Sqoop
  • Implemented Apache PIG scripts to load data from and to store data into Hive.
  • Partitioned and bucketed Hive tables and compressed data with Snappy to load data into Parquet hive tables from Avro hive tables
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requirements of customer facing API
  • Worked on Batch processing and real-time data processing on Spark Streaming using Lambda architecture
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive
  • Utilized Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of using MapReduce in Java
  • Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn.

We'd love your feedback!