Sr AWS Data Engineer Resume Raleigh NC - Hire IT People

SUMMARY

Around 8 years of experience in the IT industry and 6+ years’ experience as a Data Engineer in designing, developing and deploying big data applications by using Hadoop ecosystem, Amazon Web Services (AWS), Microsoft Azure and Google cloud applications
A Certified and high - level skilled engineer with strong analytical experience, problem solving and understand the business requirements
Strong experience with AWS tools such as EC2, Kinesis, S3, EMR, RDS, S3, Athena, Glue, Elasticsearch, Lambda, Redshift, ECS & Airflow
Deployed applications using Terraform cloud from Amazon S3 bucket and used AWS Lambda with python code to start EMR clusters
Hands on exposure on configuring Multi-node clusters using AWS on EC2
Strong experience in designing, building and deploying the AWS components (EC2 and S3)
In-Depth knowledge on AWS stack: Redshift, Lambda, RDS, S3, EC2 to create data pipelines for analytics
Experience in building centralized data warehouse on AWS platform using MySQL database on RDS and S3
Strong experience with Microsoft Azure tools such as Azure Databricks, Azure Data lake, Azure BLOB, Azure data factory, SQL DB, Cosmos DB and Azure DevOps
Experience in creating Azure data factory pipelines & data transformations, key vaults in Data factory
Experience in maintaining cloud data warehouse on Azure Synapse Analytics and experience in Azure BLOB for data loading into Azure SQL synapse analytics
Involved in creating pipelines and data flows using Azure Databricks and Pyspark
Experience in Azure DevOps for building and deploying applications with Azure Repos and Azure boards and worked on Azure DevOps service for CI/CD pipelines along with Docker and Jenkins
Experience in creating ETL pipelines using AWS glue, Athena and Redshift by extracting data using S3
Experience in Azure data storage services for the ETL process using PySpark and Spark SQL
Hands on exposure on the overall big data architecture and frameworks that includes storage management, data warehouse and automating ETL processes
Strong knowledge on big data related technology and the process of Data storage, querying, and processing of data using big data tools
Experience in application development using Hadoop ecosystem (Hadoop Distributed File systems (HDFS), MapReduce, Yarn), Spark, Hive, Impala, Sqoop, Airflow, oozie, Kafka, and Flume on AWS and Azure platforms
Strong experience in Hadoop distribution services like Cloudera, Amazon EMR and Azure HDinsight
Hands on experience in installation, configuration and developing the big data infrastructure using the Hadoop clusters on both AWS and Azure cloud platforms
Extensive knowledge in using Flume for workflow design and scheduling jobs for HDFS and Oozie
Experience in using Spark API to analyze the Hive Data with Hadoop cluster and YARN
Strong experience on using HiveQL, Ozzie and Hbase within Cloudera distribution system
Strong experience in implementation of bigdata pipelines for batch and real time processing using Spark, Sqoop, Kafka and Flume and experience in using Impala, Spark and Hive to implement end-to-end pipelines
Worked on Data imports/exports using Sqoop from Teradata and Relational Database management system
Designed and implemented data pipeline framework to for data ingestion to snowflake
Experience in creating database objects in Snowflake and Extensive knowledge in role-based access controls, data sharing, query performance tuning in Snowflake
Used python scripts for loading data into snowflake and loaded large complex sets
Proficient in SQL databases, NoSQL databases (Hbase, Cassandra and MongoDB) and experience in integrating the NoSQL databases with Hadoop clusters
Involved in creating ETL pipelines using SnowSQL and python tools
Experience in using Informatica for ETL process and streamlined the interface for executing data pipelines
Knowledge in creating ETL jobs in Talend to push existing data into the data warehouse system
Experience in Implementing Spark by using spark SQL and python to enable faster data processing and strong knowledge and expertise on Apache Spark with regards to real time analytics
Implemented end to end pipelines and Used the Lambda architecture for the serverless pipelines and created the automated processes for data pipelines to Snowflake with S3
Experience in migration process from SQL servers to Snowflake and in configuring snowflake environments and in creating stg tables, snowflake schema dimensions for reporting
Strong experience in data migration services for Snowflake & Informatica from existing data sources
Developed automation scripts using python for integration testing and used NumPy for data extraction
Strong knowledge on Github repositories and Github requests for extracting and automating data for CI/CD pipelines, Jenkins and Docker and extensive knowledge on Gitlab for automating CI/CD Scripts
Strong knowledge on SQL queries and Extracts, and experience in installation of SQL servers and Experience in maintaining APX servers and SQL databases
Experience in the agile and scrum methodology with good collaboration /communication skills with the business team

TECHNICAL SKILLS

Big Data Technologies: Hadoop, MapReduce, Hive, Oozie, Sqoop, Spark, Informatica, Kafka, flume, HDFS, Yarn, Hbase, Apache Spark, Impala, Kafka

AWS: EC2, Kinesis, S3, EMR, RDS, S3, Glue, Elasticsearch, Lambda, Redshift, ECS

Azure: Databricks, Data lake, BLOB, data factory, SQL DB, Cosmos DB, Azure DevOps

ETL tools: Snowflake, Informatica, Talend, Tableau, Power BI

No SQL database: Hbase, Cassandra, Dynamo DB, Mongo DB

Monitoring and Reporting: Power BI, Tableasu, Orange BI

Hadoop Distribution: Hortonworks, cloudera, Amazon EMR, Azure HDinsight

Programming languages: Scala, Python, SQL, HiveQL, PowerShell, Java

Operating systems: Linux, Unix, Mac OS, Windows 7, Windows 8 and Windows 10

Version control: GIT

Databases: Oracle SQL, MySQL, Teradata

Cloud computing: Amazon Web services and Microsoft Azure

PROFESSIONAL EXPERIENCE

Sr AWS Data Engineer

Confidential, Raleigh NC

Responsibilities:

Developed data loading strategies using the Hadoop cluster (HDFS, HIVE, AWS kinesis)
Designed and implemented cloud data architecture using the AWS tools
Built AWS big data pipeline using DynamoDB, S3, AWS glue and Amazon Athena.
Used to AWS glue to create Batch pipelines and real time processing jobs
Extracted data using migration tools like AWS glue from Amazon RDS and loaded the data to Amazon S3 in Json format
Used Json schema to define table and column mapping from S3 data to Redshift.
Involved in the integration process of Amazon Redshift
Worked on AWS Redshift for shifting all data warehouses to one data warehouse.
Designed and developed ETL jobs to extract data from Netsuite replica and load it in AWS Redshift
Analysed data stored into s3 buckets using SQL, Pyspark and stored the processed data into AWS redshift using spark components
Implemented Workload Management (WML) in AWS Redshift to prioritize dashboard queries over complex queries in order to enhance the reporting interface.
Implemented Lambda architecture for creating a combination of batch and real-time data pipelines using Airflow
Good experience on working with DAG’s in Airflow
Developed Airflow operators using Python to interact with services like EMR, Athena, S3, DynamoDB, Snowflake and Hive
Owned an Apache Airflow server for scheduling distributed computing jobs.
Involved in parallel and sequential execution of spark jobs in Airflow.
Built a centralized data warehouse on the AWS platform using MySQL database on RDS and S3
Expert in data ingestion by using tools like Kinesis, S3 and airflow for the EMR cluster
Launched redshift clusters by creating IAM roles
Created ETL pipelines using AWS glue, Athena and Redshift by extracting data using S3
Experience in using the AWS stack such as Redshift, Lambda, RDS, S3, EC2 to create data pipelines for analytics
Built servers using Amazon EC2
Designed AWS data pipelines using the AWS resources such as Lambda, S3 and EMR
Worked on creating data pipelines with Airflow to schedule Pyspark jobs for performing incremental loads.
Deployed applications using Terraform cloud from Amazon S3 bucket and used AWS Lambda with python code to start EMR clusters
Used Python to write ETL scripts which also includes the conversion of Json files
Used Python within EC2 to remediate S3 storage buckets based on the access requirements and compliance
Used the Lambda architecture for the serverless pipelines and created the automated processes for data pipelines with S3
Worked with SparkSQL and creating RDD using PySpark on HDFS and used it for data extraction of data within AWS glue using Pyspark.
Performed ELT operations using PySpark, SparkSQL and Python on large data clusters (PB)
Implemented spark applications to improve the application performance by using Scala
Implemented Spark by using spark SQL and python to enable faster data processing
Used Impala, Spark and Hive to implement end to end data pipelines
Implemented CI/CD containers using docker, and Jenkins for code build and AWS ECS for code deploy.
Worked in the Agile methodology and collaborated with the project team members to fasten the project’s progress.

Azure Data Engineer

Confidential

Responsibilities:

Worked on data migration from on-prem to cloud databases (Snowflake to Azure)
Used SQL, Azure data factory and PowerShell for data migration process
Involved in Data warehouse implementations using the Azure ecosystem such as Azure Data Warehouse, Azure Data lake storage (ADLS) and Azure Data factory v2
Designed and managed Azure data factory pipelines and pulled data from SQL server, Google cloud
Created data sets for developing the azure data factory pipelines and maintained the architectural responsibilities.
Extensive knowledge on Data transformations, Key vaults in Azure data factory
Deployed Data factory for data pipelines in order to orchestrate data to the Azure SQL database
Used Azure’s ETL service (Azure Data factory) for data ingestion from Cloudera Hadoop’s HDFS to Azure Data Lake storage
Used the Cosmos activity to process the data pipeline in Azure Data factory
Designed the transformation process in Azure Data Lake (ADLS)
Briefly used ansible playbook for deploying code pipeline for Power BI within the Azure data lake storage
Experience in maintaining cloud data warehouse on Azure Synapse Analytics
Hands on exposure on Azure BLOB for data loading into Azure SQL synapse analytics
Orchestrated all data pipelines using Airflow to interact with Azure Services.
Maintained the data pipeline architecture in Azure cloud using Data factory and Data bricks
Created Apache Parquet files by using the Databricks storage layer for audit history
Used Databricks and Pyspark for creating pipelines and complex data flows
Experience in Azure data storage services for the ETL process using PySpark and Spark SQL
Migrated the ETL logic using Azure pipelines to meet the business requirements
Used Azure DevOps to build and deploy applications with Azure Repos ad Azure boards
Used Azure Devops services for building CI/CD pipelines for managing applications
Worked on setting up and connecting SQL servers to Azure databases
Used Git for version control and also for tracking the updates of code merges
Worked in the Agile methodology and have experience in using Jira and confluence for tickets & issues

Big Data Engineer

Confidential

Responsibilities:

Experience in working with structured data by importing and exporting from DynamoDB to HDFS, Hive using Sqoop
Involved in data migration from existing data platforms to Hadoop and built data warehouse within Hadoop clusters such as Hive, oozie and Sqoop
Used flume to generate data cluster files and loaded the data to Relational database management systems by using Sqoop
Implemented Lambda architecture for creating a combination of batch and real-time data pipelines using Airflow
Good experience on working with DAG’s in Airflow
Developed Airflow operators using Python to interact with services like EMR, DynamoDB, Snowflake and Hive
Involved in migration process of Teradata from SQL server to Snowflake
Designed and implemented data pipeline framework to for data ingestion to snowflake
Experience in creating database objects in Snowflake
Extensive knowledge in role-based access controls, data sharing, query performance tuning in Snowflake
Used Snowpipe for load & transform data from external sources to Snowflake
Used Hive QL for structured data and wrote custom UDF’s by optimizing hive queries
Created stg tables in snowflake and worked with snowflake schema dimensions for reporting purposes
Worked on Snowflake schema and performed data quality analysis using SnowSQL
Strong understanding of time travel concept and also in understanding data share
Used Query performance through micro partitions in Snowflake
Experience in building, creating and configuring snowflake environments for the overall data processing
Hands on exposure on performing technical data analysis for data warehousing initiatives
Used SnowSQL and Python tools for developing ETL pipelines in data warehouse systems
Used python scripts for loading data into snowflake and loaded large complex sets
Created python script as Cassandra Rest API and used the script to load the data into Hive
Experience in using Spark API to analyze the Hive Data with Hadoop cluster and YARN
Created AWS sources such as EC2, SNS for Terraform scripts
Hands on exposure on configuring Multi-node clusters using AWS on EC2
Created daily background jobs using AWS S3 load, unload, load generator and grid variables
Used copy statements from S3 to create data pipelines for Data load and data transform
Used Github requests (push, pull & merge) for CI/CD scripts during the migration process to Snowflake
Used Gitlab to automate CI/CD scripts and schedule background jobs
Used UNIX Scripting for Data ingestion process and developed the data before loading it to staging area
Experience in performing backup and restoration of databases
Involved in software version upgrades and monthly patches for maintaining the systems
Debugged QA issues and fixed the defects based on the Change Requests

Data Engineer

Confidential

Responsibilities:

Assisted the team to work on the installation and configuration of Hadoop clusters
Used MapReduce jobs to load data sets to Hbase and used Hive optimization to improve the performance
Used Sqoop to import data from relational database management systems to HDFS
Involved in developing data cleaning process by using HiveQL and MapReduce
Maintained HBase tables using Hive Queries for Data storage process
Used Oozie to schedule Hbase jobs
Used Oozie to build complex data transformations
Worked on cloudera distribution and integrated Hadoop clusters to cloudera distribution system
Maintained data sources using tableau
Integrated Tableau to existing databases like MySQL to run background jobs
Used tableau for to create financial dashboards based on the sales (profit/loss) and revenues
Created SQL queries to extract and generate product data reports using different parameters and attributes
Responsible for maintaining SQL server databases and performing data validation for complex SQL queries
Worked on the ETL process using SQL in order to populates data from the database servers
Created Schema flows for the ETL process based on the business requirements for data enrichment
Used python and SQL for developing ETL pipelines and loaded the use cases to HDFS
Created ETL jobs in Talend to push existing data into the data warehouse system
Used Informatica Power center for the ETL process from third party source systems to existing databases
Involved in data warehouse optimizing using Informatica and cloudera with Hadoop cluster for curated data
Extracted data from Oracle and SQL servers by using Informatica, analysed the data for transformation process
Assisted the team to streamline informatica’s interface to execute data pipelines for data load, extraction and data cleansing using Hadoop cluster
Developed automation scripts using python for integration and functional testing
Extracted data using NumPy modules in python
Created data patterns to understand the customer’s behaviour on the product purchases and use data clustering tools to create raw data
Implemented different methodologies like type 1 and types 2 for the ODS tables
Used Github for version control to pull and push repository files to the local servers

Junior SQL database administrator

Confidential

Responsibilities:

Experience in admin responsibilities for SQL server for various cluster environments using inbuilt tools ( Query store, SQL server profiler)
Been part of change management processes and created users in the database system as per the requirements
Responsible for providing access for different role groups for various departments
Created user logins with integrated the single sign on with the existing IT framework.
Managed permissions and access for overall organizational hierarchy and allocated privileges based on the managers and teams across the company’s retail store employees
Assisted the teams to migrate databases by importing, exporting and database mirroring
Created SQL tables for reporting and reconciliation for HR, payroll and learning & development teams
Integrated the SQL server to APX tool to ensure data security compliances were managed
Loaded data from external sources to SQL server database
Maintained APX servers for generating reports and assigned access for Management heads, sales heads and retail store managers based on their reporting hierarchy
Assisted the team to Migrate HR & payroll systems from Resource link to Oracle R12 HRMS
Managed Datasets and data clusters within the database to create/modify and generate reports
Involved in the database recovery and backup process for the organizations and provided support to users to troubleshoot issues
Worked on reporting using Orange BI tool for analytics and integrated the reporting data to Tableau
Overlooked Unix/Linux issues within the network and helped with the troubleshooting process
Applied and monitored the data patches during version upgrades and new installations

We provide IT Staff Augmentation Services!

Sr Aws Data Engineer Resume

Raleigh, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship