Sr Aws Data Engineer Resume
Raleigh, NC
SUMMARY
- Around 8 years of experience in the IT industry and 6+ years’ experience as a Data Engineer in designing, developing and deploying big data applications by using Hadoop ecosystem, Amazon Web Services (AWS), Microsoft Azure and Google cloud applications
- A Certified and high - level skilled engineer with strong analytical experience, problem solving and understand the business requirements
- Strong experience with AWS tools such as EC2, Kinesis, S3, EMR, RDS, S3, Athena, Glue, Elasticsearch, Lambda, Redshift, ECS & Airflow
- Deployed applications using Terraform cloud from Amazon S3 bucket and used AWS Lambda with python code to start EMR clusters
- Hands on exposure on configuring Multi-node clusters using AWS on EC2
- Strong experience in designing, building and deploying the AWS components (EC2 and S3)
- In-Depth knowledge on AWS stack: Redshift, Lambda, RDS, S3, EC2 to create data pipelines for analytics
- Experience in building centralized data warehouse on AWS platform using MySQL database on RDS and S3
- Strong experience with Microsoft Azure tools such as Azure Databricks, Azure Data lake, Azure BLOB, Azure data factory, SQL DB, Cosmos DB and Azure DevOps
- Experience in creating Azure data factory pipelines & data transformations, key vaults in Data factory
- Experience in maintaining cloud data warehouse on Azure Synapse Analytics and experience in Azure BLOB for data loading into Azure SQL synapse analytics
- Involved in creating pipelines and data flows using Azure Databricks and Pyspark
- Experience in Azure DevOps for building and deploying applications with Azure Repos and Azure boards and worked on Azure DevOps service for CI/CD pipelines along with Docker and Jenkins
- Experience in creating ETL pipelines using AWS glue, Athena and Redshift by extracting data using S3
- Experience in Azure data storage services for the ETL process using PySpark and Spark SQL
- Hands on exposure on the overall big data architecture and frameworks that includes storage management, data warehouse and automating ETL processes
- Strong knowledge on big data related technology and the process of Data storage, querying, and processing of data using big data tools
- Experience in application development using Hadoop ecosystem (Hadoop Distributed File systems (HDFS), MapReduce, Yarn), Spark, Hive, Impala, Sqoop, Airflow, oozie, Kafka, and Flume on AWS and Azure platforms
- Strong experience in Hadoop distribution services like Cloudera, Amazon EMR and Azure HDinsight
- Hands on experience in installation, configuration and developing the big data infrastructure using the Hadoop clusters on both AWS and Azure cloud platforms
- Extensive knowledge in using Flume for workflow design and scheduling jobs for HDFS and Oozie
- Experience in using Spark API to analyze the Hive Data with Hadoop cluster and YARN
- Strong experience on using HiveQL, Ozzie and Hbase within Cloudera distribution system
- Strong experience in implementation of bigdata pipelines for batch and real time processing using Spark, Sqoop, Kafka and Flume and experience in using Impala, Spark and Hive to implement end-to-end pipelines
- Worked on Data imports/exports using Sqoop from Teradata and Relational Database management system
- Designed and implemented data pipeline framework to for data ingestion to snowflake
- Experience in creating database objects in Snowflake and Extensive knowledge in role-based access controls, data sharing, query performance tuning in Snowflake
- Used python scripts for loading data into snowflake and loaded large complex sets
- Proficient in SQL databases, NoSQL databases (Hbase, Cassandra and MongoDB) and experience in integrating the NoSQL databases with Hadoop clusters
- Involved in creating ETL pipelines using SnowSQL and python tools
- Experience in using Informatica for ETL process and streamlined the interface for executing data pipelines
- Knowledge in creating ETL jobs in Talend to push existing data into the data warehouse system
- Experience in Implementing Spark by using spark SQL and python to enable faster data processing and strong knowledge and expertise on Apache Spark with regards to real time analytics
- Implemented end to end pipelines and Used the Lambda architecture for the serverless pipelines and created the automated processes for data pipelines to Snowflake with S3
- Experience in migration process from SQL servers to Snowflake and in configuring snowflake environments and in creating stg tables, snowflake schema dimensions for reporting
- Strong experience in data migration services for Snowflake & Informatica from existing data sources
- Developed automation scripts using python for integration testing and used NumPy for data extraction
- Strong knowledge on Github repositories and Github requests for extracting and automating data for CI/CD pipelines, Jenkins and Docker and extensive knowledge on Gitlab for automating CI/CD Scripts
- Strong knowledge on SQL queries and Extracts, and experience in installation of SQL servers and Experience in maintaining APX servers and SQL databases
- Experience in the agile and scrum methodology with good collaboration /communication skills with the business team
TECHNICAL SKILLS
Big Data Technologies: Hadoop, MapReduce, Hive, Oozie, Sqoop, Spark, Informatica, Kafka, flume, HDFS, Yarn, Hbase, Apache Spark, Impala, Kafka
AWS: EC2, Kinesis, S3, EMR, RDS, S3, Glue, Elasticsearch, Lambda, Redshift, ECS
Azure: Databricks, Data lake, BLOB, data factory, SQL DB, Cosmos DB, Azure DevOps
ETL tools: Snowflake, Informatica, Talend, Tableau, Power BI
No SQL database: Hbase, Cassandra, Dynamo DB, Mongo DB
Monitoring and Reporting: Power BI, Tableasu, Orange BI
Hadoop Distribution: Hortonworks, cloudera, Amazon EMR, Azure HDinsight
Programming languages: Scala, Python, SQL, HiveQL, PowerShell, Java
Operating systems: Linux, Unix, Mac OS, Windows 7, Windows 8 and Windows 10
Version control: GIT
Databases: Oracle SQL, MySQL, Teradata
Cloud computing: Amazon Web services and Microsoft Azure
PROFESSIONAL EXPERIENCE
Sr AWS Data Engineer
Confidential, Raleigh NC
Responsibilities:
- Developed data loading strategies using the Hadoop cluster (HDFS, HIVE, AWS kinesis)
- Designed and implemented cloud data architecture using the AWS tools
- Built AWS big data pipeline using DynamoDB, S3, AWS glue and Amazon Athena.
- Used to AWS glue to create Batch pipelines and real time processing jobs
- Extracted data using migration tools like AWS glue from Amazon RDS and loaded the data to Amazon S3 in Json format
- Used Json schema to define table and column mapping from S3 data to Redshift.
- Involved in the integration process of Amazon Redshift
- Worked on AWS Redshift for shifting all data warehouses to one data warehouse.
- Designed and developed ETL jobs to extract data from Netsuite replica and load it in AWS Redshift
- Analysed data stored into s3 buckets using SQL, Pyspark and stored the processed data into AWS redshift using spark components
- Implemented Workload Management (WML) in AWS Redshift to prioritize dashboard queries over complex queries in order to enhance the reporting interface.
- Implemented Lambda architecture for creating a combination of batch and real-time data pipelines using Airflow
- Good experience on working with DAG’s in Airflow
- Developed Airflow operators using Python to interact with services like EMR, Athena, S3, DynamoDB, Snowflake and Hive
- Owned an Apache Airflow server for scheduling distributed computing jobs.
- Involved in parallel and sequential execution of spark jobs in Airflow.
- Built a centralized data warehouse on the AWS platform using MySQL database on RDS and S3
- Expert in data ingestion by using tools like Kinesis, S3 and airflow for the EMR cluster
- Launched redshift clusters by creating IAM roles
- Created ETL pipelines using AWS glue, Athena and Redshift by extracting data using S3
- Experience in using the AWS stack such as Redshift, Lambda, RDS, S3, EC2 to create data pipelines for analytics
- Built servers using Amazon EC2
- Designed AWS data pipelines using the AWS resources such as Lambda, S3 and EMR
- Worked on creating data pipelines with Airflow to schedule Pyspark jobs for performing incremental loads.
- Deployed applications using Terraform cloud from Amazon S3 bucket and used AWS Lambda with python code to start EMR clusters
- Used Python to write ETL scripts which also includes the conversion of Json files
- Used Python within EC2 to remediate S3 storage buckets based on the access requirements and compliance
- Used the Lambda architecture for the serverless pipelines and created the automated processes for data pipelines with S3
- Worked with SparkSQL and creating RDD using PySpark on HDFS and used it for data extraction of data within AWS glue using Pyspark.
- Performed ELT operations using PySpark, SparkSQL and Python on large data clusters (PB)
- Implemented spark applications to improve the application performance by using Scala
- Implemented Spark by using spark SQL and python to enable faster data processing
- Used Impala, Spark and Hive to implement end to end data pipelines
- Implemented CI/CD containers using docker, and Jenkins for code build and AWS ECS for code deploy.
- Worked in the Agile methodology and collaborated with the project team members to fasten the project’s progress.
Azure Data Engineer
Confidential
Responsibilities:
- Worked on data migration from on-prem to cloud databases (Snowflake to Azure)
- Used SQL, Azure data factory and PowerShell for data migration process
- Involved in Data warehouse implementations using the Azure ecosystem such as Azure Data Warehouse, Azure Data lake storage (ADLS) and Azure Data factory v2
- Designed and managed Azure data factory pipelines and pulled data from SQL server, Google cloud
- Created data sets for developing the azure data factory pipelines and maintained the architectural responsibilities.
- Extensive knowledge on Data transformations, Key vaults in Azure data factory
- Deployed Data factory for data pipelines in order to orchestrate data to the Azure SQL database
- Used Azure’s ETL service (Azure Data factory) for data ingestion from Cloudera Hadoop’s HDFS to Azure Data Lake storage
- Used the Cosmos activity to process the data pipeline in Azure Data factory
- Designed the transformation process in Azure Data Lake (ADLS)
- Briefly used ansible playbook for deploying code pipeline for Power BI within the Azure data lake storage
- Experience in maintaining cloud data warehouse on Azure Synapse Analytics
- Hands on exposure on Azure BLOB for data loading into Azure SQL synapse analytics
- Orchestrated all data pipelines using Airflow to interact with Azure Services.
- Maintained the data pipeline architecture in Azure cloud using Data factory and Data bricks
- Created Apache Parquet files by using the Databricks storage layer for audit history
- Used Databricks and Pyspark for creating pipelines and complex data flows
- Experience in Azure data storage services for the ETL process using PySpark and Spark SQL
- Migrated the ETL logic using Azure pipelines to meet the business requirements
- Used Azure DevOps to build and deploy applications with Azure Repos ad Azure boards
- Used Azure Devops services for building CI/CD pipelines for managing applications
- Worked on setting up and connecting SQL servers to Azure databases
- Used Git for version control and also for tracking the updates of code merges
- Worked in the Agile methodology and have experience in using Jira and confluence for tickets & issues
Big Data Engineer
Confidential
Responsibilities:
- Experience in working with structured data by importing and exporting from DynamoDB to HDFS, Hive using Sqoop
- Involved in data migration from existing data platforms to Hadoop and built data warehouse within Hadoop clusters such as Hive, oozie and Sqoop
- Used flume to generate data cluster files and loaded the data to Relational database management systems by using Sqoop
- Implemented Lambda architecture for creating a combination of batch and real-time data pipelines using Airflow
- Good experience on working with DAG’s in Airflow
- Developed Airflow operators using Python to interact with services like EMR, DynamoDB, Snowflake and Hive
- Involved in migration process of Teradata from SQL server to Snowflake
- Designed and implemented data pipeline framework to for data ingestion to snowflake
- Experience in creating database objects in Snowflake
- Extensive knowledge in role-based access controls, data sharing, query performance tuning in Snowflake
- Used Snowpipe for load & transform data from external sources to Snowflake
- Used Hive QL for structured data and wrote custom UDF’s by optimizing hive queries
- Created stg tables in snowflake and worked with snowflake schema dimensions for reporting purposes
- Worked on Snowflake schema and performed data quality analysis using SnowSQL
- Strong understanding of time travel concept and also in understanding data share
- Used Query performance through micro partitions in Snowflake
- Experience in building, creating and configuring snowflake environments for the overall data processing
- Hands on exposure on performing technical data analysis for data warehousing initiatives
- Used SnowSQL and Python tools for developing ETL pipelines in data warehouse systems
- Used python scripts for loading data into snowflake and loaded large complex sets
- Created python script as Cassandra Rest API and used the script to load the data into Hive
- Experience in using Spark API to analyze the Hive Data with Hadoop cluster and YARN
- Created AWS sources such as EC2, SNS for Terraform scripts
- Hands on exposure on configuring Multi-node clusters using AWS on EC2
- Created daily background jobs using AWS S3 load, unload, load generator and grid variables
- Used copy statements from S3 to create data pipelines for Data load and data transform
- Used Github requests (push, pull & merge) for CI/CD scripts during the migration process to Snowflake
- Used Gitlab to automate CI/CD scripts and schedule background jobs
- Used UNIX Scripting for Data ingestion process and developed the data before loading it to staging area
- Experience in performing backup and restoration of databases
- Involved in software version upgrades and monthly patches for maintaining the systems
- Debugged QA issues and fixed the defects based on the Change Requests
Data Engineer
Confidential
Responsibilities:
- Assisted the team to work on the installation and configuration of Hadoop clusters
- Used MapReduce jobs to load data sets to Hbase and used Hive optimization to improve the performance
- Used Sqoop to import data from relational database management systems to HDFS
- Involved in developing data cleaning process by using HiveQL and MapReduce
- Maintained HBase tables using Hive Queries for Data storage process
- Used Oozie to schedule Hbase jobs
- Used Oozie to build complex data transformations
- Worked on cloudera distribution and integrated Hadoop clusters to cloudera distribution system
- Maintained data sources using tableau
- Integrated Tableau to existing databases like MySQL to run background jobs
- Used tableau for to create financial dashboards based on the sales (profit/loss) and revenues
- Created SQL queries to extract and generate product data reports using different parameters and attributes
- Responsible for maintaining SQL server databases and performing data validation for complex SQL queries
- Worked on the ETL process using SQL in order to populates data from the database servers
- Created Schema flows for the ETL process based on the business requirements for data enrichment
- Used python and SQL for developing ETL pipelines and loaded the use cases to HDFS
- Created ETL jobs in Talend to push existing data into the data warehouse system
- Used Informatica Power center for the ETL process from third party source systems to existing databases
- Involved in data warehouse optimizing using Informatica and cloudera with Hadoop cluster for curated data
- Extracted data from Oracle and SQL servers by using Informatica, analysed the data for transformation process
- Assisted the team to streamline informatica’s interface to execute data pipelines for data load, extraction and data cleansing using Hadoop cluster
- Developed automation scripts using python for integration and functional testing
- Extracted data using NumPy modules in python
- Created data patterns to understand the customer’s behaviour on the product purchases and use data clustering tools to create raw data
- Implemented different methodologies like type 1 and types 2 for the ODS tables
- Used Github for version control to pull and push repository files to the local servers
Junior SQL database administrator
Confidential
Responsibilities:
- Experience in admin responsibilities for SQL server for various cluster environments using inbuilt tools ( Query store, SQL server profiler)
- Been part of change management processes and created users in the database system as per the requirements
- Responsible for providing access for different role groups for various departments
- Created user logins with integrated the single sign on with the existing IT framework.
- Managed permissions and access for overall organizational hierarchy and allocated privileges based on the managers and teams across the company’s retail store employees
- Assisted the teams to migrate databases by importing, exporting and database mirroring
- Created SQL tables for reporting and reconciliation for HR, payroll and learning & development teams
- Integrated the SQL server to APX tool to ensure data security compliances were managed
- Loaded data from external sources to SQL server database
- Maintained APX servers for generating reports and assigned access for Management heads, sales heads and retail store managers based on their reporting hierarchy
- Assisted the team to Migrate HR & payroll systems from Resource link to Oracle R12 HRMS
- Managed Datasets and data clusters within the database to create/modify and generate reports
- Involved in the database recovery and backup process for the organizations and provided support to users to troubleshoot issues
- Worked on reporting using Orange BI tool for analytics and integrated the reporting data to Tableau
- Overlooked Unix/Linux issues within the network and helped with the troubleshooting process
- Applied and monitored the data patches during version upgrades and new installations