We provide IT Staff Augmentation Services!

Sr Aws Data Engineer Resume

3.00/5 (Submit Your Rating)

Raleigh, NC

SUMMARY

  • Around 8 years of experience in the IT industry and 6+ years’ experience as a Data Engineer in designing, developing and deploying big data applications by using Hadoop ecosystem, Amazon Web Services (AWS), Microsoft Azure and Google cloud applications
  • A Certified and high - level skilled engineer with strong analytical experience, problem solving and understand the business requirements
  • Strong experience with AWS tools such as EC2, Kinesis, S3, EMR, RDS, S3, Athena, Glue, Elasticsearch, Lambda, Redshift, ECS & Airflow
  • Deployed applications using Terraform cloud from Amazon S3 bucket and used AWS Lambda with python code to start EMR clusters
  • Hands on exposure on configuring Multi-node clusters using AWS on EC2
  • Strong experience in designing, building and deploying the AWS components (EC2 and S3)
  • In-Depth knowledge on AWS stack: Redshift, Lambda, RDS, S3, EC2 to create data pipelines for analytics
  • Experience in building centralized data warehouse on AWS platform using MySQL database on RDS and S3
  • Strong experience with Microsoft Azure tools such as Azure Databricks, Azure Data lake, Azure BLOB, Azure data factory, SQL DB, Cosmos DB and Azure DevOps
  • Experience in creating Azure data factory pipelines & data transformations, key vaults in Data factory
  • Experience in maintaining cloud data warehouse on Azure Synapse Analytics and experience in Azure BLOB for data loading into Azure SQL synapse analytics
  • Involved in creating pipelines and data flows using Azure Databricks and Pyspark
  • Experience in Azure DevOps for building and deploying applications with Azure Repos and Azure boards and worked on Azure DevOps service for CI/CD pipelines along with Docker and Jenkins
  • Experience in creating ETL pipelines using AWS glue, Athena and Redshift by extracting data using S3
  • Experience in Azure data storage services for the ETL process using PySpark and Spark SQL
  • Hands on exposure on the overall big data architecture and frameworks that includes storage management, data warehouse and automating ETL processes
  • Strong knowledge on big data related technology and the process of Data storage, querying, and processing of data using big data tools
  • Experience in application development using Hadoop ecosystem (Hadoop Distributed File systems (HDFS), MapReduce, Yarn), Spark, Hive, Impala, Sqoop, Airflow, oozie, Kafka, and Flume on AWS and Azure platforms
  • Strong experience in Hadoop distribution services like Cloudera, Amazon EMR and Azure HDinsight
  • Hands on experience in installation, configuration and developing the big data infrastructure using the Hadoop clusters on both AWS and Azure cloud platforms
  • Extensive knowledge in using Flume for workflow design and scheduling jobs for HDFS and Oozie
  • Experience in using Spark API to analyze the Hive Data with Hadoop cluster and YARN
  • Strong experience on using HiveQL, Ozzie and Hbase within Cloudera distribution system
  • Strong experience in implementation of bigdata pipelines for batch and real time processing using Spark, Sqoop, Kafka and Flume and experience in using Impala, Spark and Hive to implement end-to-end pipelines
  • Worked on Data imports/exports using Sqoop from Teradata and Relational Database management system
  • Designed and implemented data pipeline framework to for data ingestion to snowflake
  • Experience in creating database objects in Snowflake and Extensive knowledge in role-based access controls, data sharing, query performance tuning in Snowflake
  • Used python scripts for loading data into snowflake and loaded large complex sets
  • Proficient in SQL databases, NoSQL databases (Hbase, Cassandra and MongoDB) and experience in integrating the NoSQL databases with Hadoop clusters
  • Involved in creating ETL pipelines using SnowSQL and python tools
  • Experience in using Informatica for ETL process and streamlined the interface for executing data pipelines
  • Knowledge in creating ETL jobs in Talend to push existing data into the data warehouse system
  • Experience in Implementing Spark by using spark SQL and python to enable faster data processing and strong knowledge and expertise on Apache Spark with regards to real time analytics
  • Implemented end to end pipelines and Used the Lambda architecture for the serverless pipelines and created the automated processes for data pipelines to Snowflake with S3
  • Experience in migration process from SQL servers to Snowflake and in configuring snowflake environments and in creating stg tables, snowflake schema dimensions for reporting
  • Strong experience in data migration services for Snowflake & Informatica from existing data sources
  • Developed automation scripts using python for integration testing and used NumPy for data extraction
  • Strong knowledge on Github repositories and Github requests for extracting and automating data for CI/CD pipelines, Jenkins and Docker and extensive knowledge on Gitlab for automating CI/CD Scripts
  • Strong knowledge on SQL queries and Extracts, and experience in installation of SQL servers and Experience in maintaining APX servers and SQL databases
  • Experience in the agile and scrum methodology with good collaboration /communication skills with the business team

TECHNICAL SKILLS

Big Data Technologies: Hadoop, MapReduce, Hive, Oozie, Sqoop, Spark, Informatica, Kafka, flume, HDFS, Yarn, Hbase, Apache Spark, Impala, Kafka

AWS: EC2, Kinesis, S3, EMR, RDS, S3, Glue, Elasticsearch, Lambda, Redshift, ECS

Azure: Databricks, Data lake, BLOB, data factory, SQL DB, Cosmos DB, Azure DevOps

ETL tools: Snowflake, Informatica, Talend, Tableau, Power BI

No SQL database: Hbase, Cassandra, Dynamo DB, Mongo DB

Monitoring and Reporting: Power BI, Tableasu, Orange BI

Hadoop Distribution: Hortonworks, cloudera, Amazon EMR, Azure HDinsight

Programming languages: Scala, Python, SQL, HiveQL, PowerShell, Java

Operating systems: Linux, Unix, Mac OS, Windows 7, Windows 8 and Windows 10

Version control: GIT

Databases: Oracle SQL, MySQL, Teradata

Cloud computing: Amazon Web services and Microsoft Azure

PROFESSIONAL EXPERIENCE

Sr AWS Data Engineer

Confidential, Raleigh NC

Responsibilities:

  • Developed data loading strategies using the Hadoop cluster (HDFS, HIVE, AWS kinesis)
  • Designed and implemented cloud data architecture using the AWS tools
  • Built AWS big data pipeline using DynamoDB, S3, AWS glue and Amazon Athena.
  • Used to AWS glue to create Batch pipelines and real time processing jobs
  • Extracted data using migration tools like AWS glue from Amazon RDS and loaded the data to Amazon S3 in Json format
  • Used Json schema to define table and column mapping from S3 data to Redshift.
  • Involved in the integration process of Amazon Redshift
  • Worked on AWS Redshift for shifting all data warehouses to one data warehouse.
  • Designed and developed ETL jobs to extract data from Netsuite replica and load it in AWS Redshift
  • Analysed data stored into s3 buckets using SQL, Pyspark and stored the processed data into AWS redshift using spark components
  • Implemented Workload Management (WML) in AWS Redshift to prioritize dashboard queries over complex queries in order to enhance the reporting interface.
  • Implemented Lambda architecture for creating a combination of batch and real-time data pipelines using Airflow
  • Good experience on working with DAG’s in Airflow
  • Developed Airflow operators using Python to interact with services like EMR, Athena, S3, DynamoDB, Snowflake and Hive
  • Owned an Apache Airflow server for scheduling distributed computing jobs.
  • Involved in parallel and sequential execution of spark jobs in Airflow.
  • Built a centralized data warehouse on the AWS platform using MySQL database on RDS and S3
  • Expert in data ingestion by using tools like Kinesis, S3 and airflow for the EMR cluster
  • Launched redshift clusters by creating IAM roles
  • Created ETL pipelines using AWS glue, Athena and Redshift by extracting data using S3
  • Experience in using the AWS stack such as Redshift, Lambda, RDS, S3, EC2 to create data pipelines for analytics
  • Built servers using Amazon EC2
  • Designed AWS data pipelines using the AWS resources such as Lambda, S3 and EMR
  • Worked on creating data pipelines with Airflow to schedule Pyspark jobs for performing incremental loads.
  • Deployed applications using Terraform cloud from Amazon S3 bucket and used AWS Lambda with python code to start EMR clusters
  • Used Python to write ETL scripts which also includes the conversion of Json files
  • Used Python within EC2 to remediate S3 storage buckets based on the access requirements and compliance
  • Used the Lambda architecture for the serverless pipelines and created the automated processes for data pipelines with S3
  • Worked with SparkSQL and creating RDD using PySpark on HDFS and used it for data extraction of data within AWS glue using Pyspark.
  • Performed ELT operations using PySpark, SparkSQL and Python on large data clusters (PB)
  • Implemented spark applications to improve the application performance by using Scala
  • Implemented Spark by using spark SQL and python to enable faster data processing
  • Used Impala, Spark and Hive to implement end to end data pipelines
  • Implemented CI/CD containers using docker, and Jenkins for code build and AWS ECS for code deploy.
  • Worked in the Agile methodology and collaborated with the project team members to fasten the project’s progress.

Azure Data Engineer

Confidential

Responsibilities:

  • Worked on data migration from on-prem to cloud databases (Snowflake to Azure)
  • Used SQL, Azure data factory and PowerShell for data migration process
  • Involved in Data warehouse implementations using the Azure ecosystem such as Azure Data Warehouse, Azure Data lake storage (ADLS) and Azure Data factory v2
  • Designed and managed Azure data factory pipelines and pulled data from SQL server, Google cloud
  • Created data sets for developing the azure data factory pipelines and maintained the architectural responsibilities.
  • Extensive knowledge on Data transformations, Key vaults in Azure data factory
  • Deployed Data factory for data pipelines in order to orchestrate data to the Azure SQL database
  • Used Azure’s ETL service (Azure Data factory) for data ingestion from Cloudera Hadoop’s HDFS to Azure Data Lake storage
  • Used the Cosmos activity to process the data pipeline in Azure Data factory
  • Designed the transformation process in Azure Data Lake (ADLS)
  • Briefly used ansible playbook for deploying code pipeline for Power BI within the Azure data lake storage
  • Experience in maintaining cloud data warehouse on Azure Synapse Analytics
  • Hands on exposure on Azure BLOB for data loading into Azure SQL synapse analytics
  • Orchestrated all data pipelines using Airflow to interact with Azure Services.
  • Maintained the data pipeline architecture in Azure cloud using Data factory and Data bricks
  • Created Apache Parquet files by using the Databricks storage layer for audit history
  • Used Databricks and Pyspark for creating pipelines and complex data flows
  • Experience in Azure data storage services for the ETL process using PySpark and Spark SQL
  • Migrated the ETL logic using Azure pipelines to meet the business requirements
  • Used Azure DevOps to build and deploy applications with Azure Repos ad Azure boards
  • Used Azure Devops services for building CI/CD pipelines for managing applications
  • Worked on setting up and connecting SQL servers to Azure databases
  • Used Git for version control and also for tracking the updates of code merges
  • Worked in the Agile methodology and have experience in using Jira and confluence for tickets & issues

Big Data Engineer

Confidential

Responsibilities:

  • Experience in working with structured data by importing and exporting from DynamoDB to HDFS, Hive using Sqoop
  • Involved in data migration from existing data platforms to Hadoop and built data warehouse within Hadoop clusters such as Hive, oozie and Sqoop
  • Used flume to generate data cluster files and loaded the data to Relational database management systems by using Sqoop
  • Implemented Lambda architecture for creating a combination of batch and real-time data pipelines using Airflow
  • Good experience on working with DAG’s in Airflow
  • Developed Airflow operators using Python to interact with services like EMR, DynamoDB, Snowflake and Hive
  • Involved in migration process of Teradata from SQL server to Snowflake
  • Designed and implemented data pipeline framework to for data ingestion to snowflake
  • Experience in creating database objects in Snowflake
  • Extensive knowledge in role-based access controls, data sharing, query performance tuning in Snowflake
  • Used Snowpipe for load & transform data from external sources to Snowflake
  • Used Hive QL for structured data and wrote custom UDF’s by optimizing hive queries
  • Created stg tables in snowflake and worked with snowflake schema dimensions for reporting purposes
  • Worked on Snowflake schema and performed data quality analysis using SnowSQL
  • Strong understanding of time travel concept and also in understanding data share
  • Used Query performance through micro partitions in Snowflake
  • Experience in building, creating and configuring snowflake environments for the overall data processing
  • Hands on exposure on performing technical data analysis for data warehousing initiatives
  • Used SnowSQL and Python tools for developing ETL pipelines in data warehouse systems
  • Used python scripts for loading data into snowflake and loaded large complex sets
  • Created python script as Cassandra Rest API and used the script to load the data into Hive
  • Experience in using Spark API to analyze the Hive Data with Hadoop cluster and YARN
  • Created AWS sources such as EC2, SNS for Terraform scripts
  • Hands on exposure on configuring Multi-node clusters using AWS on EC2
  • Created daily background jobs using AWS S3 load, unload, load generator and grid variables
  • Used copy statements from S3 to create data pipelines for Data load and data transform
  • Used Github requests (push, pull & merge) for CI/CD scripts during the migration process to Snowflake
  • Used Gitlab to automate CI/CD scripts and schedule background jobs
  • Used UNIX Scripting for Data ingestion process and developed the data before loading it to staging area
  • Experience in performing backup and restoration of databases
  • Involved in software version upgrades and monthly patches for maintaining the systems
  • Debugged QA issues and fixed the defects based on the Change Requests

Data Engineer

Confidential

Responsibilities:

  • Assisted the team to work on the installation and configuration of Hadoop clusters
  • Used MapReduce jobs to load data sets to Hbase and used Hive optimization to improve the performance
  • Used Sqoop to import data from relational database management systems to HDFS
  • Involved in developing data cleaning process by using HiveQL and MapReduce
  • Maintained HBase tables using Hive Queries for Data storage process
  • Used Oozie to schedule Hbase jobs
  • Used Oozie to build complex data transformations
  • Worked on cloudera distribution and integrated Hadoop clusters to cloudera distribution system
  • Maintained data sources using tableau
  • Integrated Tableau to existing databases like MySQL to run background jobs
  • Used tableau for to create financial dashboards based on the sales (profit/loss) and revenues
  • Created SQL queries to extract and generate product data reports using different parameters and attributes
  • Responsible for maintaining SQL server databases and performing data validation for complex SQL queries
  • Worked on the ETL process using SQL in order to populates data from the database servers
  • Created Schema flows for the ETL process based on the business requirements for data enrichment
  • Used python and SQL for developing ETL pipelines and loaded the use cases to HDFS
  • Created ETL jobs in Talend to push existing data into the data warehouse system
  • Used Informatica Power center for the ETL process from third party source systems to existing databases
  • Involved in data warehouse optimizing using Informatica and cloudera with Hadoop cluster for curated data
  • Extracted data from Oracle and SQL servers by using Informatica, analysed the data for transformation process
  • Assisted the team to streamline informatica’s interface to execute data pipelines for data load, extraction and data cleansing using Hadoop cluster
  • Developed automation scripts using python for integration and functional testing
  • Extracted data using NumPy modules in python
  • Created data patterns to understand the customer’s behaviour on the product purchases and use data clustering tools to create raw data
  • Implemented different methodologies like type 1 and types 2 for the ODS tables
  • Used Github for version control to pull and push repository files to the local servers

Junior SQL database administrator

Confidential

Responsibilities:

  • Experience in admin responsibilities for SQL server for various cluster environments using inbuilt tools ( Query store, SQL server profiler)
  • Been part of change management processes and created users in the database system as per the requirements
  • Responsible for providing access for different role groups for various departments
  • Created user logins with integrated the single sign on with the existing IT framework.
  • Managed permissions and access for overall organizational hierarchy and allocated privileges based on the managers and teams across the company’s retail store employees
  • Assisted the teams to migrate databases by importing, exporting and database mirroring
  • Created SQL tables for reporting and reconciliation for HR, payroll and learning & development teams
  • Integrated the SQL server to APX tool to ensure data security compliances were managed
  • Loaded data from external sources to SQL server database
  • Maintained APX servers for generating reports and assigned access for Management heads, sales heads and retail store managers based on their reporting hierarchy
  • Assisted the team to Migrate HR & payroll systems from Resource link to Oracle R12 HRMS
  • Managed Datasets and data clusters within the database to create/modify and generate reports
  • Involved in the database recovery and backup process for the organizations and provided support to users to troubleshoot issues
  • Worked on reporting using Orange BI tool for analytics and integrated the reporting data to Tableau
  • Overlooked Unix/Linux issues within the network and helped with the troubleshooting process
  • Applied and monitored the data patches during version upgrades and new installations

We'd love your feedback!