We provide IT Staff Augmentation Services!

Aws Data Engineer Resume

4.00/5 (Submit Your Rating)

Bellevue, WA

SUMMARY

  • A dynamic professional with around 8 years of experience as a Big Data Engineer, ETL Developer, and Java Developer incorporates planning, creating, and carrying out information models for big business level applications.
  • Excellent comprehension of advances on frameworks that contains colossal measures of data and run for an exceptionally and highly distributed fashion in Cloudera, Hortonworks Hadoop distributions, and Amazon AWS.
  • Experience in large scale application development using Big Data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, Apache Nifi, AWS,Azure, Google Cloud Platform.
  • Sound Experience with AWS services like Amazon EC2, S3, EMR, Amazon RDS, VPC, Amazon Elastic Load Balancing, IAM, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS, and Lambda to trigger resources.
  • Analytics and cloud migration from on-premises to AWS Cloud with AWS EMR, S3, and DynamoDB.
  • Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena.
  • Detailed knowledge of AWS databases Elasticache (Memcached and Redis) and NoSQL databases such as HBase, DynamoDB, as well as database performance tuning and data modeling, tuning, disaster recovery, backup and creating data pipelines.
  • Design and Developed data pipeline to ingest transactional data to S3 as data-lakes using BDA,Kinesis Streams, lambda and glue.
  • Developed test programs manipulating data directly from/intoHBase tables for testing/analysis purpose.
  • Implemented microservices architecture with API Gateway, Lambda andDynamoDB and deployed Applications/Infrastructure using core AWS - EC2, S3, RDS, EBS,DynamoDB, SNS, SQS.
  • Experience in creating and managing, reporting and analytics infrastructure for internal business clients using AWS services including Athena, Redshift, Spectrum, EMR, and Quick Sight.
  • Extensive experience developing and implementing cloud architecture on Microsoft Azure.
  • Created an Azure SQL database, monitored it, and restored it. Migrated Microsoft SQL server to Azure SQL database
  • Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical administrations, Big Data Technologies (Apache Spark) & Data Bricks.
  • Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse to control and grant database access.
  • Created a connection from Azure to an on-premises data center using the Azure Express Route for Single and Multi-Subscription.
  • Experience in OLTP/OLAP system study, analysis, and E-R modeling, as well as developing database schemas such as the Star schema and Snowflake schema, which are utilized in relational, dimensional, and multidimensional modeling.
  • Extensive knowledge in all phases of Data Acquisition and Data Warehousing, Data Modeling analysis using Star Schema and Snowflake for FACT and Dimensions Tables.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs.
  • Experience creating Web-Services with the Python programming language.
  • Involved in migration of the legacy applications to cloud platform using DevOps tools like GitHub, Jenkins, JIRA, Docker, and Slack.
  • Experience in designing interactive dashboards, reports, performing ad-hoc analysis, and visualizations usingTableau, Power BI, Arcadia, and Matplotlib.
  • Worked with Spark to improve the speed and optimization of current Hadoop algorithms utilizing Spark Context, Spark-SQL, Data Frame, Pair RDD, and Spark YARN.
  • Worked with the Map Reduce programming paradigm and the Hadoop Distributed File System.

TECHNICAL SKILLS

Big Data Ecosystems: HDFS, YARN, MapReduce, Spark, Kafka, Hive, Airflow, StreamSets, Sqoop, HBase, Oozie, ZooKeeper, Nifi, Ranger, Ambari

Scripting Language: Python, Scala, PowerShell Scripting, Pig Latin, HiveQL.

Cloud Environment: Amazon Web Services (AWS), Microsoft Azure

NoSQL Database: Cassandra, Redis, MongoDB, Neo4j

Database: MySQL, Oracle, Teradata, MS SQL SERVER, PostgreSQL, DB2, Snowflake

Version Control: Git, SVN, Bitbucket

ETL Tools: Tableau, Microsoft Excel, Informatica, Power BI, R, Google Data Studio

Application Server: Apache Tomcat 5.x 6.0, JBoss 4.0

Others: Machine learning, NLP, Spring Boot, Jupyter Notebook, Terraform, Docker, Kubernetes, Jenkins, Ansible, Splunk, Jira

PROFESSIONAL EXPERIENCE:

Confidential, Bellevue, WA

AWS Data Engineer

Responsibilities:

  • Extensively used AWS Athena to import structured data from S3 into other systems such as RedShift or to generate reports.
  • Building use cases in snowflake by bring various sources using Attunity
  • Migration of Hadoop solution to Snowflake platform
  • Worked with Spark to improve the speed and optimization of Hadoop's current algorithms.
  • RDBMS as part of a Proof of Concept (POC) on Amazon EC2.
  • Migrated an existing on-premises application to AWS. AWS services, for example, EC2 and S3 were utilized for data set processing and storage.
  • The Spark-Streaming APIs were used to perform on-the-fly changes and activities for the common learner data model, which gets information from Kinesis in real-time.
  • Performed end-to-end architecture and implementation evaluations of different AWS services such as Amazon EMR, Redshift, S3, Athena, Glue, and Kinesis.
  • Hive We created external table schemas for the data being processed as the primary query engine of EMR.
  • Created Apache presto and Apache drill configurations on an AWS EMR (Elastic Map Reduce) cluster to integrate different databases such as MySQL and Hive.
  • This allows for the comparison of outcomes such as joins and inserts on many data sources controlled by a single platform.
  • Working on Data integration tool (Attunity) to bring the data from different sources to snowflake
  • Writing Python scrips - Pandas to connect with snowflake to perform ETL
  • Developed End-to-End data lake solution in Hadoop and Snowflake
  • Developed views and templates with Python and Django's view controller.
  • AWS RDS (Relational database services) was created to act as a Hive meta store, and metadata from 20 EMR clusters could be integrated into a single RDS, preventing data loss even if the EMR was terminated.
  • Involved in data extraction, transformation and loading(ETL process) from source to target system using Power center of Informatica.
  • Used ETL informatica power center for loading data from Oracle/flat files into target database.
  • Design and maintain informatica power center mapping for extraction, transformation and loading between Oracle and teradata.
  • Hands on experience in tuning, mapping, identifying and resolving performance bottlenecks at various levels like sources, target, mapping and sessions.
  • Developed and implemented ETL pipelines on S3 parquet files in a data lake using AWS Glue.
  • Developed a cloud formation template in JSON format to utilize content delivery with cross-region replication using Amazon Virtual Private Cloud.
  • Implemented SQL Alchemy which is apython library for complete access over SQL.
  • Dealt with Python Open stack API's, used Python scripts to update content in the database and manipulate files.
  • Migration of Hadoop solution to Snowflake platform
  • Developed Pyspark script to perform ETL using glue job, where the data is extracted from S3 using crawler and creating a data catalog to store the metadata.
  • Designed and developed an entire module in python and deployed in AWS GLUE usingPyspark library and Python.
  • Vast experience on the Teradata database, most work was ELT with transformations and optimizations done in Teradata. I createdpipelines to load data into the EDW.
  • Worked on the code transfer of a quality monitoring application from AWS EC2 to AWS Lambda, as well as the construction of logical datasets to administer quality monitoring on snowflake warehouses.
  • Worked on creating workloads HDFS on Kubernetes clusters to mimic theproduction workload for development purposes.
  • Code Commit, Code Build,Code Deploy, Code Pipeline, Jenkins, Bit bucket Pipelines, Elastic Beanstalk.
  • Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena.

Environment: Python, Databricks, PySpark, Kafka, GitLab, PyCharm, AWS S3, Delta Lake, Snowflake. Cloudera CDH 5.9.16, Hive, Impala, Kubernetes, Flume, Apache Nifi, Java, Shell-scripting, SQL, Sqoop, Oozie, Java, Python, Oracle, SQL Server, HBase, PowerBI, Agile Methodology.

Confidential, Eden Prairie, MN

Azure Data Engineer

Responsibilities:

  • Performed migration of several Databases, Application and Web servers from on-premises environments to Microsoft Azure Cloud environment.
  • Used Azure Data Factory extensively for ingesting data from disparate source systems.
  • Used Azure Data Factory as an orchestration tool for integrating data from upstream to downstream systems.
  • Designed and Implemented pipelines inAzure Synapse/ADF to Extract, Transform and load Data from several sources including Azure SQL,Azure SQL Datawarehouse etc.
  • Hands on experience designing and buildingdata models anddata pipelines onData Warehouse focus and Data Lakes.
  • Experienced in managing Azure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how to integrate with otherAzure Services.
  • Experienced in creating a Data Lake Storage Gen2 storage account and a file system.
  • Integrateddata storage options with S park, notably with Azure Data Lake Storage and Blob storage.
  • Used Copy Activity in Azure Data Factory to copydata among datastores located on-premises and in the cloud.
  • Created Python notebooks on Azure Databricks for processing the datasets and load them into Azure SQL databases.
  • Worked on advanced analytics using Python with Modules like Pandas, NumPy and Diplomata fordata extraction, manipulation.
  • Deployed containerized Airflow on K8s for job orchestration (Docker, Jenkins, Airflow,Python).
  • Built the trigger-based Mechanism to reduce the cost of different resources like Web Job andData Factories usingAzure Logic Apps and Functions.
  • Worked on creating star schema for drillingdata. Created pyspark procedures, functions, packages to loaddata.
  • Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud-based technologies such as Azure Blob Storage and Azure SQL Database.
  • Hands on experience in creating pipelines in Azure Data Factory V2 using activities like Move & Transform, Copy, filter, for each, Get Metadata, Lookup Data Bricks etc.
  • Developed code to parsedata formats like Parquet, Json, etc. Fetcheddata from theData Lake and other sources to aggregatedata.
  • Successfully executed a proof of concept for Azure implementation, with the wider objective of transferring on-premises servers anddata to the cloud.
  • Created Dynamic Pipelines, data Sets, Linked servers in Azure Data Factory (ADF) fordata movement and data transformations.
  • Used ETL informatica power center for loading data from Oracle/flat files into target database.
  • Developed mapping spreadsheets that will provide the Data Warehouse Development (ETL) team with source to target Data Mapping.
  • Native integration with Azure Active Directory (Azure AD) and other Azure services enables to build moderndata warehouse and machine learning and real-time analytics solutions.
  • Used Hive queries to analyze huge data sets of structured, unstructured, and semi-structured data.
  • Used structured data in Hive to enhance performance using sophisticated techniques including bucketing, partitioning and optimizing self joins.

Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL Server, Windows remote desktop, AZURE PowerShell, Databricks, Python, Kubernetes, Azure SQL Server, Azure Data Warehouse.

Confidential, Bentonville, AR

Data Engineer

Responsibilities:

  • Developed distributed high-performance systems with Spark and Scala.
  • Created Scala apps for loading/streaming data into NoSQL databases (MongoDB) and HDFS is preferred.
  • Performed T-SQL tuning and query optimization for and SSIS packages.
  • Developed distributed algorithms for detecting and successfully processing data trends.
  • Extensive experience in working with various distributions ofHadoop enterprise versions of Cloudera (CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution.
  • Created dataflow between SQL Server andHadoop clusters using Apache Nifi.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly onHadoop.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms inHadoop.
  • Used Oozie workflow engine to manage independentHadoop jobs and to automate several types ofHadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs
  • Hive tables were created on HDFS to store thedata processed by Apache Spark on theHadoop Cluster in Parquet format.
  • Created an SSIS package to import data from SQL tables into various Excel sheets.
  • Used Spark SQL to pre-process, clean, and combine big data sets.
  • Performed data validation using Redshift and built pipelines capable of processing more than 100TB per day.
  • Developed the SQL server database system to optimize performance.
  • Performed migration of databases from conventional data warehouses to spark clusters.
  • Performed frequent cleaning and integrity tests, you can ensure that the data warehouse was only loaded with high-quality entries.
  • Developed SQL queries to extract data from existing sources and validate format correctness.
  • Created automated tools and dashboards for collecting and displaying dynamic data.

Environment: T-SQL, MongoDB, HDFS, Scala, Spark SQL, Relational Databases, Redshift, SSIS, SQL, Linux, Data Validation, MS Excel.

SQL BI Developer

Confidential - Philadelphia PA

Responsibilities:

  • Planned, Defined and Designed data based on business requirement and provided documentation.
  • Gathered data and documenting it for further reference and designed Database using Erwin data modeler.
  • CreatedSSIS package to load data from Flat Files and SQL Server 2008R2 to SQL Server 2012 by using Lookup, Fuzzy Lookup, Derived Columns, Condition Split, Term Extraction, Aggregate, Pivot Transformation, and Slowly Changing Dimension.
  • Used SSIS to create ETL packages to Validate, Extract, Transform and Load data to Data Warehouse and Data Mart Databases.
  • Modified the existingSSIS packages to meet the changes specified by the users.
  • Scheduled Jobs for executing the storedSSIS packages, which were developed to update the database on Daily basis using SQL Server Agent.
  • Involved in creating alerts for successful or unsuccessful completion of scheduled jobs.
  • Worked in setup logging environment to track the status of the events like error raise, task fail.
  • Generated parameterized reports, sub reports, tabular reports usingSSRS 2012.
  • Designed, Developed and Deployed reports in MS SQL Server environment usingSSRS -2012
  • Generated Sub-Reports,Conditional, Drilldown reports, Drill through reports and parameterized reports usingSSRS 2012
  • Created reports to retrieve data using stored procedures that accepts parameters.
  • Scheduled the monthly/weekly/daily reports to run automatically onto the Dashboard
  • Responsible for Monitoring and Tuning Report performance.
  • Created various database object like stored procedures, functions, tables, views
  • Configured the report viewing security for various users at various levels using Report Manager
  • Creation of database objects like tables, views, materialized views, procedures and packages using oracle tools like Toad, PL/ SQL Developer and SQL * plus.
  • Scheduled report and exported in PDF format.
  • Fine-tuned SQL Queries for maximum efficiency and performance using SQL Profiler and Database Engine Tuning Advisor
  • Provided documentation for all kinds of reports andSSIS packages.
  • Designed and implemented stored procedures and Triggers for automating tasks.

Environment: MS SQL Server 2012/2008, SSIS, T-SQL, PL/SQL, SSRS, BIDS, Win 2008 Advanced Server, VSS, SQL Profiler, Database Engine Tuning Advisor, DTS, VB script, Erwin v7.3.

SQL Developer

Confidential

Responsibilities:

  • Involved in the life cycle of the project, i.e., requirements gathering, design, development, testing and maintenance of the database.
  • Created Database Objects like Tables, Stored Procedures, Views, Clustered and Non-Clustered indexes, Triggers, Rules, Defaults, User defined data types and functions.
  • Performed and fine-tuned stored procedures and SQL Queries and User Defined Functions using Execution Plan for better performance.
  • Created and scheduled SQL jobs to runSSIS packages daily, using MS SQL Server Integration Services (SSIS).
  • Performed query optimization and tuning, debugging and maintenance of stored procedures.
  • Database Creation, Assigning Database Security and Standard data modelling techniques.
  • Performed troubleshooting operations on the production servers.
  • Monitored, tuned and analysed database performance and allocated server resources to achieve optimum database performance.
  • Creating Staging Database and Import Tables in MS SQL Server.
  • Loading the data in the systems using Loader scripts, Cursors, Stored Procedures.
  • Testing the data in Test Environment, client validation, issues resolution.
  • Developing reports onSSRS on SQL Server (2008).

Environment: SQLServer 2008R2, SSIS, Power shell,SQLProfiler, Database Engine Tuning Advisor, DTS, VB script, MS Access, T-SQL

Network Engineer

Confidential

Responsibilities:

  • Handling network devices such as Switches (Cisco Catalyst 2900 and 3500 series), Routers (Cisco 2600, 2800 and 7300 series), Firewalls, Load balancers etc.
  • Experienced in configuring Virtual Device Context in Nexus 7010.
  • Configuring IP, RIP, EIGRP, OSPF and BGP in routers.
  • Implemented Voice VLANS, UDP, SIP, and RTP and provide QOS by DSCP and IP Precedence
  • Used DHCP to automatically assign reusable IP addresses to DHCP clients.
  • Implementation and configuration of F5 Big-IP LTM-6400 load balancers.
  • Configuration and extension of VLAN from one network segment to other network segment between different vendor switches (Cisco, Juniper).
  • Experienced in working with Cisco Nexus 2148 Fabric Extender and Nexus 5000series to provide a flexible Access Solution for datacenter access architecture.
  • Applied Security rules embedded with forwarding using ACI Tech.
  • Design and implement campus switch network with Cisco Layer 3 switches (3750, 4500, and 6500) in multi VLANs environment and inter-VLAN routing, HSRP, ISL trunk, ether channel.
  • Designed MPLS VPN and QoS for the architecture using Cisco multilayer switches.
  • Maintained Security policy by monitoring PIX firewalls (515 and 520).
  • Worked on Extensively on Palo Alto Firewalls, Cisco Firewalls, Cisco PIX (506E/515E/525) & ASA 5500(5510/5540) Series.

Environment: Cisco Catalyst switches, Juniper switches, Cisco PIX, Routing Protocols

We'd love your feedback!