We provide IT Staff Augmentation Services!

Aws Data Engineer Resume

5.00/5 (Submit Your Rating)

MA

PROFESSIONAL SUMMARY:

  • Around 9 years of experience in systems analysis, design, and development in the fields of Java, Data Warehousing,
  • Hadoop Ecosystem, AWS Cloud Data Engineering, Data Visualization, Reporting and Data Quality Solutions.
  • Good experience in Amazon Web Services like S3, IAM, EC2, EMR, Kinesis, VPC, Dynamo DB, RedShift, Amazon RDS, Lambda, Athena, Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS and other services of the AWS family.
  • Hands on experience in Data Analytics Services such as Athena, Glue, Data Catalog & Quick Sight.
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Experience in developing the Hadoop based applications using HDFS, MapReduce, Spark, Hive, Sqoop, HBase and Oozie.
  • Hands on experience in Architecting Legacy Data Migration projects on - premises to AWS Cloud.
  • Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
  • Experience in building and optimizing AWS data pipelines, architectures, and data sets.
  • Hands on experience on tools like Hive for data analysis and Sqoop for data ingestion and Oozie for scheduling.
  • Experience in scheduling and configuring the oozie and also having good experience in writing Oozie workflow and coordinators.
  • Worked on different file formats like JSON, XML, CSV, ORC, Paraquet. Experience in processing both structured and semi structured Data with the given file formats.
  • Worked on Apache Spark performing the Actions, Transformations on RDDs, Data Frames & Datasets using spark SQL and Spark Spark streaming contexts.
  • Having good experience in spark core, spark SQL and spark streaming.
  • Having good experience in writing Python Lambda functions and calling the API’s.
  • Good knowledge in Kafka and Flume.
  • Experience in Java, Java EE (J2ee) technologies and proficient in Core Java, Servlets, JSP, EJB, JDBC, XML, and spring,Struts and Hibernate and RESTful Webservices.
  • Proven knowledge of standards-compliant, cross-browser compatible HTML, CSS, JavaScript, and Ajax.
  • Having good experience in different SDLC models including Waterfall, V-Model and Agile.
  • Involved in Daily standups and sprint planning and review meetings in Agile model.

TECHNICAL SKILLS:

Programming Languages: Java1.4/1.5/1.6, Python

Hadoop/Big Data: HDP, HDFS, Sqoop, Hive, Pig, HBase, MapReduce, Spark, Oozie.

AWS Cloud Technologies: IAM, S3, EC2, VPC, EMR, Glue, Dynamo DB, RDS, Redshift, Cloud Watch, Cloud Trail, Cloud Formation, Kinesis, Lambda, Athena, EBS, DMS, Elastic Search, SQS, SNS, KMS, QuickSight, ELB, Auto ScalingXML,XSL,XSLT,EJB 2.0/3.0,Struts1.x/2.x,Spring2.5,Hibernate3.2,Ajax.

Scripting Languages: Java Script, Python, Shell Script.

Web Servers: Apache Tomcat4.1/5.0

Databases: Oracle (PL/SQL, SQL), DB2, Netezza

Tools: CVS, Code Commit, GIT hub,ApacheLog4j,TOAD,ANT,Maven,Junit,JMock,Mockito,REST HTTP Client,JMeter,Cucumber,Jenkins,Aginity.

ETL Tools: Informatica, DataStage.

IDE’s: Eclipse, IBM’s RAD7.5

PROFESSIONAL EXPERIENCE:

AWS Data Engineer

Confidential,  MA

Responsibilities:

  • Designed and setup Enterprise Data Lake to provide support for various uses cases including Storing, processing, Analytics and Reporting of voluminous, rapidly changing data by using various AWS Services.
  • Used various AWS services including S3,EC2, AWS Glue, Athena, RedShift,EMR,SNS,SQS,DMS,Kenesis.
  • Extracted data from multiple source systems S3, Redshift, RDS and Created multiple tables/databases in Glue Catalog by creating Glue Crawlers.
  • Created AWS Glue crawlers for crawling the source data in S3 and RDS.
  • Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded into S3,Redshift and RDS.
  • Created multiple Recipes in Glue Data Brew and then used in various Glue ETL Jobs.
  • Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, Parquet/Text Files into AWS Redshift.
  • Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations using AWS Athena.
  • Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populate AWS Glue data Catalog with metadata table definitions.
  • Used AWS Glue for transformations and AWS Lambda to automate the process.
  • Used AWS EMR to transform and move large amounts of data into and out of AWS S3.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs using CloudWatch.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift and S3.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
  • To analyze the data Vastly used Athena to run multiple queries on processed data from Glue ETL Jobs and then used Quick Sight to generate Reports for Business Intelligence.
  • Used AWS EMR to transform and move large amounts of data into and out of AWS S3.
  • Used DMS to migrate tables from homogeneous and heterogenious DBs from On-premise to AWS Cloud.
  • Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into S3, Dynamo DB and Redshift for storage and analyzation.
  • Created Lambda functions to run the AWS Glue job based on the AWS S3 events.

Environment: AWS Glue, S3, IAM, EC2, RDS, Redshift, EC2,Lambda, Boto3, DynamoDB, Apache Spark, Kinesis, Athena, Hive, Sqoop, Python.

AWS Data Engineer

Confidential, GA

Responsibilities:

  • Responsible for provisioning key AWS Cloud services and configure them for scalability, flexibility, and cost optimization
  • Create VPCs, subnets including private and public, NAT gateways in a multi- region, multi-zone infrastructure landscape to manage its worldwide operation
  • Manage Amazon Web Services (AWS) infrastructure with orchestration tools such as CFT, Terraform and Jenkins Pipeline
  • Create Terraform scripts to automate deployment of EC2 Instance, S3, EFS, EBS, IAM Roles, Snapshots and Jenkins Server
  • Build Cloud data stores in S3 storage with logical layers built for Raw, Curated and transformed data management
  • Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and QuickSight.
  • Create manage bucket policies and lifecycle for S3 storage as per organizations and compliance guidelines
  • Create parameters and SSM documents using AWS Systems Manager
  • Established CICD tools such as Jenkins and Git Bucket for code repository, build and deployment of the python code base
  • Build Glue Jobs for technical data cleansing such as deduplication, NULL value imputation and other redundant column removal. Also build Glue jobs to build standard data transformations (date/string and Math operations) and Business transformations required by business users.
  • Used Kinesis Family(Kinesis Data streams, Kinesis Firehose, Kinesis Data Analytics) for collection, processing and analyze the streaming data.
  • Create Athena data sources on S3 buckets for adhoc querying and business dashboarding using Quicksight and Tableau reporting tools
  • Copy Fact/Dimension and aggregate output from S3 to Redshift for Historical data analysis using Tableau and Quicksight
  • Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline
  • Use PyCharm IDE for Python/PySpark development and Git for version control and repository management

Environment: AWS - EC2, VPC, S3, EBS, ELB, CloudWatch, CloudFormation, ASG, Lambda, AWS CLI, GIT, Glue, Athena and QuickSight, Python and PySpark, Shell scripting, Jenkins.

AWS DATA ENGINEER

Confidential, MN

Responsibilities:

  • Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda function and configured it to receive events from your S3 bucket
  • Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis, creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora.
  • Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.)
  • Creating AWS Lambda functions using python for deployment management in AWS and designed and implemented public facing websites on Amazon Web Services and integrated it with other applications infrastructure.
  • Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function.
  • Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB, Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service Catalog.
  • Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory check, Load check, Disk space verification, to ensure the application availability and performance by using cloud watch and AWS X-ray. implemented AWS X-Ray service inside Confidential, it allows development teams to visually detect node and edge latency distribution directly from the service map Tools.
  • Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Utilized Python Libraries like Boto3, NumPy for AWS.
  • Used Amazon EMR for MapReduce jobs and test locally using Jenkins.
  • Created external tables with partitions using Hive, AWS Athena and Redshift.
  • Developed the PySpark code for AWS Glue jobs and for EMR.
  • Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue.
  • Experience in writing SAM template to deploy serverless applications on AWS cloud.
  • Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.
  • Designed and Developed ETL jobs in AWS GLUE to extract data from S3 objects and load it in data mart in Redshift.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Redshift.
  • Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
  • Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table.

Environment: AWS EC2, S3, EBS, ELB, EMR, Lambda, RDS, SNS, SQS, VPC, IAM, Cloud formation, CloudWatch, ELK Stack, Bitbucket, Python, Shell Scripting, GIT, Jira, Unix/Linux, AWS X-Ray, Dynamo DB, Kinesis.

Hadoop -AWS Developer

Confidential

Responsibilities:

  • Participated in requirements gathering and actively involved in the developing the requirement’s into technical specifications.
  • Used SpringXD for data ingestion into HDFS.
  • Involved in development of MapReduce job’s using various API’s like Mapper, Reducer, Record Reader, Input Formatter etc.
  • Extensively used HDFS for the storing the data.
  • Worked on Hive for creating External and Internal tables and did some analysis on the data.
  • Used HiveQL for the analysis on the data and validating the data.
  • Created Hive Load Queries for loading the data from HDFS.
  • Used Sqoop to export the data to Netezza from Hive and also used to import the data from Netezza to Hive.
  • Used informatica to load the data to final table. Used bulk load for this process.
  • Created sqoop jobs for importing and exporting the data from/to Netezza.
  • Used Oozie for scheduling this entire process.
  • Worked on AWS POC for transferring data from local file system to S3.
  • Hands on experience in creating EMR cluster and developing the glue jobs.
  • Written oozie workflows and job. Properties files for managing the oozie jobs. Configured all our MapReduce, Hive, Sqoop jobs in oozie workflow.
  • Scheduled the oozie jobs using Coordinator. Written workflow.xml, job. Properties and cordinator.xml for scheduling the oozie jobs.
  • Created Kinesis streams for live streaming of the data.
  • Done some mappings in Informatica and loaded the data to the target tables.
  • Written Oozie classes for moving files and deleting the files.
  • Configured the Jar’s in the oozie workflows.
  • Validated the Hadoop jobs like MapReduce, Oozie using CLI. Able to handle the jobs in HUE too.
  • Deployed these Hadoop applications into the Development, Stage, and production Environments.
  • Created Databases and tables in Netezza for storing the data from Hive.
  • Used Spark for aggregation data from Netezza.
  • Extensively used Spark and created RDD’s and Hive Sql for the Aggregating the data.
  • Used Putty for connecting to Hadoop cluster and running the different jobs in CLI.
  • Given Demo’s to client on this application. We have used Agile Methodology for developing this application, participated in daily stand ups and actively involved in client review meetings and Sprint/PSI Demo’s.
  • Involved in the PSI Planning meetings and discussions and given the inputs based on the requirements.
  • Written Junit Test cases and MR Unit Test cases using Junit, Mockito and MR UNIT. And also involved in writing the Integration tests and ATDD using Cucumber and deployed to Jenkins.
  • Extensively used Maven as the build tool for building the application (JAR) and written POM.xml
  • Used SVN for Version controlling and Log4j for logging mechanism and used Eclipse Luna for writing MapReduce code.
  • Validating the data in Netezza tables and also as well as the downstream applications in DB2.

Environment: Hadoop,MapReduce,Java,Spark,Hive,Sqoop,Oozie,HDFS,Netezza,AWS,EMR,Glue,S3,Informatica9.1,DB2,Oracle11g,SQL,WindowsXP,AgileScrum,MRUnit,Mockito,ApacheLog4j,SpringXD,Subverion,TOAD,Cucumber,Maven,Jenkins,FileZilla,Putty,EclipseLunaAginity.

Java Developer

Confidential

Responsibilities:

  • Used Agile methodology for developing the project.
  • Developed controllers based on Spring MVC and wrote application context files.
  • Developed new WebService classes to call the applications. Developed RESTful Webservices using Jercey API.
  • Written Unit Test cases by using JMOCK and used Log4j for debugging of the application.
  • Used Basic and Custom authentication for developing the service classes.
  • Involved in bug fixing of the application. Used QC for defect Tracking.
  • Extensively used hibernate to connect to database. Used HQL (Hibernate Query Language)to communicate to database.
  • Used RAD 7.5 as the IDE for developing the application.
  • Created various XSD’s in developing the Webservices.
  • Involved in design discussion in agile process and created high level design documents for the API’s.
  • Used java 1.6 and Tomcat for developing this application.
  • Created various Jsp’s and HTML pages for documentation of the application, which is used for dealers and other consumers to know about the services.
  • Worked on different releases in this application. Based on the release we used to version the services to avoid production Issues
  • Created hibernate mapping files and POJO’s for developing this application.
  • Used Spring annotations for developing this application.
  • Used CVS and CPS for repository and building the application.
  • Extensively used JMOCK for mocking up the java classes for Unit testing. Wrote Junits for 100% coverage to avoid bugs.
  • Tested RESTful webservices using REST Client
  • Used Apache POI to write data to Excel.
  • Participated in Sprint reviews with clients and have taken inputs from them to improve the application development.
  • Joined in Daily Stand ups to give the updates on my work used to discuss issues.
  • Given demo’s to Different teams how to consume the services which I have developed.
  • Worked closely with DBA’s to access views and different tables in the database.
  • Used SQL developer to access database views.

Environment: Java,Corejava,JavaEE(J2ee),RESTfulWebservies,Jercy,SpringIOC,SpringMVC,JAXB,HTML,XML,CSS,JavaScript,AJAX,Jquery,Excel,ApachePOI,Tomcat,Oracle11g,SQL,WindowsXP,AgileScrum,Linux,JMOCK,ApacheLog4j,CVS, RAD 7.5, SQL Developer.

We'd love your feedback!