AWS Data Engineer Resume Seattle, WA - Hire IT People

SUMMARY

Around 7+ years of experience in Data warehousing wif exposure to Design, Modelling, Development, Testing, Maintenance, and customer support environments on multiple domains.
Extensive experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsight, Big Data Technologies (Hadoop and ApacheSpark) and Data bricks.
Experience in designing and implementation of cloud architecture on Microsoft Azure.
Excellent knowledge on integrating Azure Data Factory V2/V1 wif variety of data sources and processing teh data using teh pipelines, pipeline parameters, activities, activity parameters, manually/window based/event - based job scheduling.
Hands-on experience in developing Logic App workflows for performing event-based data movement, perform file operations on Data Lake, Blob Storage, SFTP/FTP Servers, getting/manipulating data in Azure SQL Server.
Implemented Azure Active Directory Service for authentication of Azure Data Factory.
Extensively worked on AWS services like EC2, S3, EMR, RDS, Athena, Lambda Function, Step Function, Glue Data Catalog, SNS, RDS(Aurora), Redshift
Worked on Data Warehouse design, implementation, and support (SQL Server, Azure SQL DB, Azure SQL Data warehouse).
Experience in implementing in ETL and ELT solutions using large data sets.
Experience in creating database objects such as Tables, Constraints, Indexes, Views, Indexed Views, Stored Procedures, UDFs and Triggers on Microsoft SQL Server.
Strong experience in writing & tuning complex SQL queries including joins, correlated sub queries and scalar sub queries.
Identify, design, and implement process improvements through automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability.
Experience & Involved in all phases of SDLC process - Requirement Gathering, Analysis, Design, Coding, Code reviews, Configuration control, QA & deployment.
Experience in Agile/SCRUM methodology.
Developed and Designed automation framework using Python and Shell Scripting
Proficient in converting Hive/SQL queries into Spark transformations using Data frames and Data sets
Experience in working wif GIT, Bitbucket Version Control System
Responsible for using Flume sink to remove teh data from Flume Channel and deposit in No-SQL database like MongoDB
Knowledge in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, STS
Involved in loading data from UNIX file system and FTP to HDFS
Hands on Experience in using Visualization tools like Tableau, Power BI
Good knowledge on Data modeling and Data Analytics tools& Exposure to different big data platforms.
Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive User Defined Functions (UDF's) as required
Capable of understanding and knowledge of job workflow scheduling and locking tools/services like Oozie, Zookeeper, Airflow and Apache NiFi
Strong experience and knowledge of real time data analytics using Spark Streaming and Flume

TECHNICAL SKILLS

Hadoop/Big Data Ecosystem: Apache Spark, HDFS, MapReduce, HIVE, Kafka, Sqoop

Programming & Scripting: Python, Pyspark, SQL, Scala

NoSQL Databases: Mongo DB, Dynamo DB

SQL Databases: MS-SQL Server, MySQL, Oracle, Postgress

Cloud Computing: AWS, Azure

Operating Systems: Ubuntu (Linux), Mac OS-X, Windows 10, 8

Reporting: Power BI, Tableau

Version Control: Git, GitHub, SVN

Methodologies: Agile/ Scrum, Rational Unified Process and Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Seattle, WA

AWS Data Engineer

Responsibilities:

Created end to end data pipeline which includes data ingestion, data curation, data provision using AWS cloud services.
Developed Spark applications using Python and implemented Apache Spark data processing project to handle data from both Batch and Streaming sources
Worked wif teh Spark for improving performance and optimization of teh existing algorithms in Hadoop.
Ingested data into S3 bucket from different sources including MySQL, Oracle, MongoDB, SFTP.
Proficient in working wif AWS services like S3, EC2, EMR, Redshift, Athena, Glue, DynamoDB, RDS, IAM
Working wif Big data services/concepts like Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming
Created EMR cluster on EC2 instances and developed PySpark applications to perform data transformations on top of it and stored into Redshift
Configured AWS Identity and Access Management (IAM) Groups and Users for improved login authentication
Used AWS Lambda using Python scripts to automate teh operation to read data types
(Parquet, JSON, Avro, CSV) from AWS S3 to AWS Redshift
Created an event-driven AWS Glue ETL pipeline using Lambda function by reading teh data from teh S3 bucket and storing it in Redshift on daily basis
Developed python script using Boto3 library to configure teh services AWS glue, EC2, S3, DynamoDB
Performed tuning of Spark Applications to set batch interval time and teh correct level of Parallelism and memory tuning
Used Spark Streaming APIs to perform transformations and actions for teh data coming from Kafka in real-time and persists it to AWS S3
Developed Kafka consumer API in python for consuming data from Kafka topics
Used Apache Kafka to aggregate web log data from multiple servers and make them available in downstream systems for Data analysis and engineering type of roles
Extracted teh data from HDFS using Hive and performed data analysis using PySpark, Redshift for feature selection and created nonparametric models in Spark
Used SQL commands DDL, DQL, DML For creating, selecting, and modifying tables
Used AWS Redshift, S3, and Athena services to query large amounts of data stored on S3 to create a Virtual Data Lake wifout having to go through teh ETL process.
Written several Map reduce Jobs using PySpark, NumPy and used Jenkins for Continuous integration.
Involved in writing custom MapReduce programs using Java API for data processing.
Designed and Developed ETL Processes in AWS Glue to migrate Campaign data from external sources wif file formats like ORC/Parquet/Text Files into AWS S3
Imported and exported data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop and developed Spark code using SQL for faster testing and processing of data
Developed Sqoop and Kafka Jobs to load data from RDBMS into HDFS and HIVE
Worked in teh development of applications especially in teh LINUX environment and familiar wif all its commands and worked on Jenkins continuous integration tool for deployment of teh project and deployed teh project into Jenkins using GIT version control system.

Environment: Spark, Spark-Streaming, PySpark, Spark SQL, AWS EMR, S3, EC2, Redshift, Athena, Lambda, Glue, DynamoDB, MapReduce, Java, HDFS, Hive, Pig, Apache Kafka, Python, Shell scripting, Linux, MySQL, NoSQL, SOLR, Jenkins, Oracle, Git, Airflow, Tableau, Power BI.

Confidential, Plano, TX

Azure Data Engineer

Responsibilities:

Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements.
Design and developed Batch processing and real-time processing solutions using ADF and Databricks clusters
Created numerous pipelines in Azure using Azure Data Factory v2 to get teh data from disparate source systems by using different Azure Activities like Transform, Copy, for each, Databricks etc.
Maintain and provide support for optimal pipelines and complex data transformations and manipulations using ADF and PySpark wif Databricks.
Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
Implemented Azure, self-hosted integration runtime to access private network data.
Used Azure Logic Apps to develop workflows which can send alerts/notifications on different jobs in Azure.
Experienced in developing audit, balance and control framework using SQL DB audit tables to control teh ingestion, transformation, and load process in Azure.
Created Linked services to connect teh external resources to ADF.
Working wif complex SQL views and Stored Procedures in large databases from various servers.
Ensure teh developed solutions are formally documented and signed off by business.
Worked wif team members to resolve any technical issue, Troubleshooting, Project Risk & Issue identification, and management.
Worked on teh cost estimation, billing, and implementation of services on teh cloud.
Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked wif Cosmos DB (SQL API and Mongo API)
Designed custom-built input adapters using Spark, Hive, and Sqoop to ingest and analyze data (Snowflake, MS SQL, MongoDB) into HDFS
Loaded data from Web servers and Teradata using Sqoop, Flume and Spark Streaming API
Used Flume sink to write directly to indexers deployed on cluster, allowing indexing during ingestion
Migrated from Oozie to Apache Airflow. Involved in developing Oozie and Airflow workflows for daily incremental loads, getting data from RDBMS (MongoDB, MS SQL)
Developed map reduce jobs using Scala for compiling teh program code into bytecode for teh JVM for data processing
Proficient in utilizing data for interactive Power BI dashboards and reporting purposes based on business requirements.

Environment: Azure HDInsight, DataBricks, DataLake, CosmosDB, MySQL, Snowflake, MongoDB, Teradata, Ambari, Flume, VSTS, Tableau, PowerBI, Azure DevOps, Ranger, Azure AD, Git, Blob Storage, Data Factory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark v2.0.2, Airflow, Hive, Sqoop, HBase.

Confidential, Raleigh, NC

Data Engineer

Responsibilities:

Experienced working wif Big Data, Data Visualization, Python Development, SQL, and UNIX.
Expertise in quantitative analysis, data mining, and teh presentation of data to see beyond teh numbers and understand trends and insights
Handled high volume of day-to-day Informatica workflow migrations
Review of Informatica ETL design documents and working closely wif development to ensure correct standards are followed
Efficient in writing Python scripts to build ETL pipeline and Directed Acyclic Graph (DAG) workflows using Airflow, Apache NiFi Tasks are distribution on celery workers to manage communication between multiple services
Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from Cassandra to Azure SQL Datawarehouse and improved teh query performance
Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked wif Cosmos DB (SQL API and Mongo API)
Designed custom-built input adapters using Spark, Hive, and Sqoop to ingest and analyze data (Snowflake, MS SQL, MongoDB) into HDFS
Loaded data from Web servers and Teradata using Sqoop, Flume and Spark Streaming API
Used Flume sink to write directly to indexers deployed on cluster, allowing indexing during ingestion
Migrated from Oozie to Apache Airflow. Involved in developing Oozie and Airflow workflows for daily incremental loads, getting data from RDBMS (MongoDB, MS SQL)
Identify and document limitations in data quality dat jeopardize teh ability of internal and external data analysts and wrote standard SQL Queries to perform data validation and created excel summary reports (Pivot tables and Charts) as well as gathered analytical data to develop functional requirements using data modeling and ETL tools
ETL Data Cleansing, Integration &Transformation using Hive and PySpark. Responsible for managing data from disparate sources
Used Spark optimizations techniques like Cache/Refresh tables, broadcasting variables, Coalesce/Repartitioning, increasing memory overhead limits, handling parallelism, and modifying teh spark default configuration variables for performance tuning
Read data from different sources like CSV file, Excel, HTML page, and SQL and performed data analysis and written to any data source like CSV file, Excel, and database
Developed and handled business logic through backend Python code.

Environment: Python, UNIX, SQL, ETL, Informatica, Spark, HTML, Azure HDInsight, DataBricks, DataLake, CosmosDB, MySQL, Snowflake, MongoDB, Teradata, Ambari, Flume, VSTS, Tableau, PowerBI, Data Factory, Data Storage Explorer, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark v2.0.2, Airflow, Hive, Sqoop, HBase.

Confidential

Big Data Engineer

Responsibilities:

Transformed data through Merge join, Derived columns, Conditional Splits, Lookup, Union All, sort and Slowly changing Dimension transformations.
Created conditional data flow through Script components, Expressions and Variables.
Developed packages to be configured on various application environments through Project Parameters, Environment variables and Package level variables.
Created database objects like SQL views, Synonyms, functions, and stored procedures to be used by Application team and Reporting team.
Involved in debugging complex SSIS packages, SQL objects and SQL job workflows.
Utilized TFS for source control, tasks, and bugs management.
Involved in gathering & analysis of business Requirements from End users & internal Business Analysts to Develop Strategies for ETL processes.
Performed data analysis for complex business issues to provide possible recommendations to resolve business problems.
Experience inPerformance Tuning- Identified and fixed bottlenecks and tuned teh complex mappings for better Performance.
Involved in requirements analysis and legacy system data analysis to design and implement ETL jobs using Microsoft SQL Server Integration Services (SSIS)
Analyzed teh data sources from Oracle, SQL Server for design, development, testing, and production rollover of reporting and analysis projects wifin x Desktop
Partnered wif Development and Quality Assurance teams to ensure teh product quality is always intact
Resolved Production Support issues using SQL, PL SQL to prevent any interference caused to teh business
Led a team of 3 interns to develop prototypes of predictive models using MS Excel to forecast teh data team's future infrastructure costs (Software, Hardware, Development and Operational) for teh growing data based on historical data
Estimated a 5% improvement in operational efficiency wif implementation of predictive analytics to estimate future infrastructure costs (Software, Hardware, Development and Operational)
Analyzed teh supply chain management revenue reports to understand teh business profits generated from teh existing suppliers and sub-suppliers for circuit breaker components.

Environment: Python, SQL, ETL, Oracle, SSIS, PL SQL, MS Excel

We provide IT Staff Augmentation Services!

Aws Data Engineer Resume

Seattle, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship