We provide IT Staff Augmentation Services!

Aws Data Engineer Resume

2.00/5 (Submit Your Rating)

CaliforniA

SUMMARY

  • 9+ years of IT experience in Analysis, Design, Development, in that 5years in Big Data technologies like Spark, Map reduce, Hive, Yarn and HDFS including programming languages like Java, pyspark and Python.
  • 4years of experience in Data warehouse / ETL Developer role.
  • Hands - on experience across Hadoop Ecosystem that includes extensive experience in Big Data technologies like HDFS, Map Reduce, YARN, Apache, Sqoop, HBase, Hive, Oozie, Impala, Pig, Zookeeper, Flume and Spark.
  • Experience with various data formats such as Json, Avro, parquet, RC and ORC formats and compressions like snappy and bzip.
  • Expertise in deploying cloud-based services with Amazon Web Services (Database, Migration, Compute, IAM, Storage, Analytics, Network & Content Delivery, Lambda and Application Integration).
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR. Hands-on experience withAmazon RDS, Auto Scaling, Cloud Watch, SNS, Athena, Glue, Kinesis, Redshift, Dynamo DB and other services of the AWS family.
  • Designed and developed logical and physical data models that utilize concept such as Star Schema, Snowflake Schema and Slowly Changing Dimensions.
  • Extensive hands on experience tuning spark Jobs.
  • Experienced in working with structured data using HiveQL, and optimizing Hive queries.
  • Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage pattern.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • Extensive hands on experience tuning spark Jobs.
  • Experienced in working with structured data using HiveQL, and optimizing Hive queries.
  • Experienced inData Modeling&Data Analysisexperience using Dimensional Data Modeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT & Dimensions tables, Physical &Logical Data Modeling.
  • Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and migrating on premise databases to Azure Data Lake store using Azure Data factory.
  • Extensive knowledge and experience on real time data streaming techniques like Kafka, and Spark Streaming.
  • Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement.
  • Knowledge of Database Architecture for OLAP and OLTP Applications, Database designing, Data Migration, Data Warehousing Concepts with emphasis on ETL.
  • Progressive involvement in Software Development Life Cycle (SDLC),GIT,Agile methodologyandSCRUMprocess. Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
  • Conscientiously skilled in System Analysis,E-R/Dimensional Data Modeling, Designand implementing RDBMS specific features. Conventionally accessing JIRA tool and other internal issue trackers for the Project developments.
  • Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self-motivated exuberant learner.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark.

Databases: Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Snowflake.

Programming Languages: Python, Pyspark, Shell script, Perl script, SQL, Java.

Cloud: AWS (EC2, EMR, Lambda, IAM, S3, Athena, Glue, Kinesis, Cloud Watch, RDS, Redshift) Azure (Data Factory, Data Lake, Data bricks, Logic App)

Tools: PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server Management Studio, Eclipse, Postman.

Version Control: SVN, Git, GitHub, Maven

Operating Systems: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS

Visualization/ Reporting: Tableau, ggplot2, matplotlib

Database Modelling: Dimension Modeling, ER Modeling

Machine Learning Techniques: Linear & Logistic Regression, Decision Trees, Clustering.

PROFESSIONAL EXPERIENCE

Confidential

AWS Data Engineer

Responsibilities:

  • Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc. integrated with Big Data/Hadoop Distribution Frameworks: Zookeeper, Yarn, Spark, Pyspark, NiFi etc.
  • Extensively used AWS Athena to ingest structured data from S3 into various systems such as Redshift or to generate reports.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kinesis in near real time.
  • Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake without having to go through ETL process.
  • Performed end- to-end Architecture & implementation assessment of variousAWSservices like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis.
  • Implemented and Developing Hive Bucketing and Partitioning
  • Implemented Kafka, spark structured streaming for real time data ingestion
  • Implemented data ingestion from various source systems using Sqoop and Pyspark
  • Implemented and Developing Hive Bucketing and Partitioning.
  • Implemented Kafka, spark structured streaming for real time data ingestion.
  • Created Pysparkscripts to perform data Manipulation, Aggregation, and load to Data frames and eventually to S3 in the migration process.
  • Developed Apache presto and Apache drill setups in AWS EMR (Elastic Map Reduce) cluster, to combine multiple databases like MySQL and Hive.
  • This enables to compare results like joins and inserts on various data sources controlling through single platform.
  • Migrated the data from Amazon Redshift data warehouse to Snowflake.
  • Involved in code migration of quality monitoring tool from AWS EC2 to AWS Lambda and built logical datasets to administer quality monitoring on snowflake warehouses.
  • Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
  • Involved in the code migration of quality monitoring tool from AWS EC2 to AWS lambda and built logical datasets to administer quality monitoring on Snowflake warehouses.
  • Used AWS Code Commit Repository to store their programming logics and script and have them again to their new clusters.
  • Involved in POC Data Extraction, aggregations, and consolidation of data within AWS Glue using Pyspark.
  • Involved in designing and deploying multiple applications utilizing almost all theAWSstack (Including EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling inAWSCloud Formation.

Environment: Amazon Web Services, Elastic Map Reduce cluster, Amazon S3, EC2s, Amazon Redshift, Pyspark, Snowflake, Yarn, Spark, pyspark, and Hive.

Confidential, California

Azure Data Engineer

Responsibilities:

  • Analyze, develop, and construct modern data solutions that allow data visualization utilizing the Azure PaaS service.
  • Determine the impact of the new implementation on existing business processes by understanding the present status of the application in production.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Implemented data ingestion from various source systems using Sqoop and Pyspark.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Undertake data analysis and collaborated with down-stream, analytics team to shape the data according to their requirement.
  • Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Data bricks notebooks.
  • UsedAzure ML to build, test and deploy predictive analytics solutions based on data.
  • Helped individual teams to set up their repositories in bit bucket and maintain their code and help them setting up jobs which can make use of CI/CD environment.
  • Applied technical knowledge to architect solutions that meet business, and IT needs, created roadmaps, and ensure long term technical viability of new deployments, infusing key analytics and AI technologies where appropriate (e.g., Azure Machine Learning, Machine Learning Server, BOT framework, Azure Cognitive Services, Azure Data bricks, etc.).
  • Extensively involved in the Analysis, design and Modeling. Worked on Snowflake Schema, Data Modeling and Elements, and Source to Target Mappings, Interface Matrix and Design elements.
  • Performed data quality issue analysis using Snow SQL by building analytical warehouses on Snowflake.
  • Designed and built a Data Discovery Platform for a large system integrator using Azure HDInsight components.
  • Used Azure data factory and data Catalog to ingest and maintain data sources. Security on HDInsight was enabled using Azure Active directory.
  • Perform analyses on data quality and apply business rules in all layers of data extraction transformation and loading process.
  • Integration of data storage solutions in spark - especially with Azure Data Lake storage and Blob storage.
  • Created Data bricks Job workflows which extracts data from SQL server and upload the files to sftp using pyspark and python.
  • Used Snowflake cloud data warehouse for integrating data from multiple source system which include nested JSON formatted data into Snowflake table.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and Pyspark with Data bricks and Azure SQL DB

Environment: Azure Data Factory (V2), Azure Data bricks, Pyspark, Snowflake, Azure SQL, Azure Data Lake, Azure Blob Storage, Azure ML, and pyspark.

Confidential

Bigdata Developer

Responsibilities:

  • Responsible for developing prototypes to the selected solutions and implementing complex big data projects with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data using multiple platforms.
  • Performed multiple Map Reduce jobs in Pig and Hive for data cleaning, pre-processing.
  • Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement.
  • Implemented Python script to call the Rest API, performed transformations and loaded the data into Hive.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka.
  • Performed importing data from various sources to the HBase cluster using Kafka connect. Worked on creating data models for HBase from existing data model.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Developed Spark code by using pyspark and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Worked on developing ETL processes to loaddatafrom multipledatasources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.

Environment: Spark-RDD data frames, Kafka, file formats, pyspark, Spark UDFs, oracle SQL, hive.

Confidential

ETL Developer

Responsibilities:

  • Working as SQL Server Analyst / Developer / DBA using SQL Server 2012, 2015, 2016.
  • Created jobs, SQL Mail Agent, Alerts and schedule DTS/SSIS Packages.
  • Manage and update the Erwin models - Logical/Physical Data Modeling for Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB according to the user requirements.
  • Source Controlling, environment specific script deployment tracking using TFS
  • Export the current Data Models into PDF out of Erwin and publish them on to SharePoint for various users.
  • Developing, Administering, and Managing corresponding databases: Consolidated Data Store, Reference Database (Source for the Code/Values of the Legacy Source Systems), and Actuarial Data Mart
  • Writing Triggers, Stored Procedures, Functions, Coding using Transact-SQL (TSQL), create and maintain Physical Structures.
  • Working with Dynamic Management Views, System Views for Indexing and other Performance Problems Analyze the Legacy Data, Import the data into Reference DB that was verified and published by DMRB (Data Management Review Board) Liaison with ETL Team to convert ETL requirements as Standard Database solutions to boost the performance.
  • Creating ETL mappings using Informatica Power CenterDay to Day maintenance of databases, Staging, Backup / Restores of databases.
  • Deployment of Scripts in different environments according to Configuration Management, Playbook requirements Create / Manage Files/File group - Table/Index association Query Tuning, Performance Tuning.
  • Defect tracking and closing by using Quality Center Maintain Users / Roles / Permissions.
  • Capacity Planning, Disk Requirements

Environment: SQL Server 2008/2012 Enterprise Edition, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server 2007, C/C++, Oracle 10g, PL/SQL, Visual SourceSafe, visual Studio 2010

We'd love your feedback!