Sr. Aws Data Engineer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- Data Engineer with around 8 years of experience in Cloud (AWS, Azure) Data warehousing, Data engineering, Feature engineering, Hadoop big data, ETL/ELT, and Business Intelligence. As a Cloud data architect and engineer, specialize in AWS and Azure frameworks, Cloudera, Hadoop Ecosystem, Spark/Py Spark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, Data DevOps Frameworks/Pipelines Programming skills in Python
- Experience in building data solutions using SQL Server, MSBI and AWS, Azure Cloud
- Worked on Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, and Data bricks
- Experience with an in - depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including EC2, EBS, S3, VPC, RDS, SES, ELB, EMR, ECS, Cloud Front, Cloud Formation, Elastic Cache, Cloud Watch, Red Shift, Lambda, SNS, Dynamo DB, Kinesis.
- Hands on experience onAWS cloudservices (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Workspaces, Lambda, Kinesis, RDS, SNS, SQS)
- Experience inHadoopEcosystem components like Hive, HDFS, Sqoop, Spark, Kafka, Pig.
- Experience in architecting, designing, installation, configuration, and management of Apache Hadoop Clusters, MapR, Horton works & Cloudera Hadoop Distribution.
- Good understanding of Hadoop architecture and Hands on experience with Hadoop components such as Resource Manager, Node Manager, Name Node, Data Node and Map Reduce concepts and HDFS
- Expertise in Data Migration, Data Profiling, Data Ingestion, Data Cleansing, Transformation, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.
- Hands on Spark RDD, Data frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming.
- Adept at utilizing BI tools such as Power BI and QlikView for enhancing reporting capabilities and developing BI applications in accordance with client requirements. knowledge in the Data Modeling and Data analyzing in MS SQL Server and in query optimization
- Extensive experience in developing complex Stored Procedures, Functions, Triggers, Views, Cursors, Indexes, CTE's, Joins and Sub queries with T-SQL.
- Experienced in managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
- Good Experience in Tableau Analytics. Build Dashboards for clients using Tableau.
- Experience in end-to-end design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
- Proficient in SQL databases MSSQL Server, MySQL (RDBMS), Oracle DB, Postgres, and MongoDB.
- Expert in various Azure services like Compute (Web Roles, Worker Roles), Caching, Azure SQL, NoSQL, Storage, and Network services, Azure Active Directory (AD), API Management, Scheduling, Azure Auto Scaling, and Azure shell, ARM, PowerShell Automation.
TECHNICAL SKILLS:
AWS Services: S3, EC2, EMR, Redshift, RDS, Lambda, Kinesis, SNS, SQS,AMI, IAM, Cloud formation
Hadoop Components / Big Data: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, pysparkAirflow, Kafka, SnowflakeSpark Components
Databases: Oracle, Microsoft SQL Server, MySQL, DB2, Teradata
Programming Languages: Java, Scala, Impala, Python.
Web Servers: Apache Tomcat, WebLogic.
IDE: Eclipse, Dreamweaver
NoSQL Databases: NoSQL Database (Hbase, Cassandra, Mongo DB)
Methodologies: Agile (Scrum), Waterfall, UML, Design Patterns, SDLC.
Currently Exploring: Apache Flink, Drill, Tachyon.
Cloud Services: AWS, AzureAzureAzure Data Factory / ETL / ELT / SSISAzure Data Lake StorageAzure Data bricks
ETL Tools: Talend Open Studio & Talend Enterprise Platform
Reporting and ETL Tools: Tableau, Power BI, AWS GLUE, SSIS, SSRS, Informatica, Data Stage
PROFESSIONAL EXPERIENCE:
Confidential: Charlotte, NC
Sr. AWS Data Engineer
Responsibilities:
- Developed strategy for cloud migration and implementation of best practices using AWS services like database migration service, AWS server migration service from On-Premises to cloud.
- Responsible for Setup and build AWS infrastructure using resources VPC, EC2, S3, Dynamo DB, IAM, EBS, Route53, SNS, SES, SQS, Cloud Watch, Cloud Trail, Security Group, Auto scaling and RDS using Cloud Formation templates.
- Backing up AWSPost GREtoS3on daily job run onEMRusing Data Frames.
- Implementation of new tools such as Kubernetes with Docker to assist with auto-scaling and continuous integration (CI) and upload a Docker image to the registry so the service is deployable through Kubernetes. Use the Kubernetes dashboard to monitor and manage the services.
- Experience in implementing AWS lambda to run servers without managing them and to trigger run code by S3 & SNS.
- Worked on implementing Data warehouse solutions inAWS Redshift, worked on various projects to migrate data from one database toAWS Redshift, RDS, ELB, EMR, Dynamo DB and S3
- Written scripts from scratch to create AWS infrastructure using languages such as BASH and Python, created Lambda functions to upload code and to check changes in S3, DynamoDB table.
- Improvised a python module that de-normalizes data from RDBMS to JSONs and saved 35 hours as part of the migration.
- Created program inpythonto handle PL/SQL functions like cursors and loops which are not supported by snowflake
- Created job chains with Jenkins Job Builder, Parameterized Triggers, and target host deployments. Utilized many Jenkins plugins and Jenkins API.
- Orchestrated and migrated CI/CD processes using Cloud Formation and Terraform, packer Templates and Containerized the infrastructure using Docker, which was setup in OpenShift, AWS and VPCs.
- Automated tasks of extracting metadata and lineage from tools using Python scripts and saved 70+ hours’ manual efforts.
- Create dashboards on snowflake cost model, usage inQlikView. Created program inpythonto handle PL/SQL functions like cursors and loops which are not supported by snowflake
- Worked on Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL and Spark Yarn.
- Managed AWS EC2 instances utilizing S3 and Glacier for our data archiving and long-term backup and UAT environments as well as infrastructure servers for GIT.
Environment: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, Cloudera, Scala IDE (Eclipse), Scala, Linux Shell Scripting, HDFS, Python, Snowflake, QlikView, Json, OpenShift, AWS Glacier
Confidential: Dallas, TX
Data Engineer
Responsibilities:
- Designed and built Spark/PySpark based ETL pipelines for migration of credit card transactions, account, and customer data into enterprise Hadoop Data Lake. Developed strategies in handling large datasets using partitions, Spark SQL, broadcast joins and performance tuning.
- Built and implemented performant data pipelines using Apache Spark on AWS EMR. Performed maintenance of data integration programs into Hadoop and RDBMS environments from both structured and semi- structured data source systems.
- Developed performance tuning on existing Hive queries and UDF’s to analyze the data. Used Pig to analyze datasets and perform transformation according to requirements.
- Supervised on data profiling and data validation to ensure the accuracy of the data between the source and the target systems. Performed job scheduling and monitoring using Auto sys and quality testing using ALM tool
- Worked on building of Tableau desktop reports and dashboards to report customer data.
- Built and published customized interactive Tableau reports and dashboards along with data refresh scheduling using Tableau Desktop.
- Snowflake - data warehouse to consume the data from C3 Platform.
- Involved inS3 event notifications, an SNS topic, an SQS queue, and a Lambda function sending a message to the Slack channel.
- Transformed Teradata scripts and stored procedures to SQL and Python running on Snowflake's cloud platform.
- Deploy and monitor scalable infrastructure on Amazon web services (AWS) and configuration management instances and Managed servers on the Amazon Web Services (AWS) platform using Ansible configuration management tools and Created instances in AWS as well as migrated data to AWS from data Center.
- Automated tasks of extracting metadata and lineage from tools using Python scripts and saved 70+ hours’ manual efforts.
- Analyzed the system requirement specifications and in client interaction during requirements specifications.
- Providing daily reports to the Development Manager and participate in both the design phase and the development phase. Utilized Agile Methodology and SCRUM Process
Environment: AWS, Hadoop, Python, Pyspark, SQL, Snowflake, Data bricks/Delta Lake, AWS S3, AWS Athena and AWS EMR.
Confidential: New Jersey NJ
Hadoop Data Developer
Responsibilities:
- Programming using Python, Scala along with Hadoop framework utilizing Cloudera Hadoop Ecosystem projects (HDFS, Spark, Sqoop, Hive, HBase, Oozie, Impala, Zookeeper, etc.).
- Involved in developing spark applications using Scala, and Python for Data transformations, cleansing as well as validation using Spark API.
- Worked on all the Spark APIs, like RDD, Data frame, Data Source, and Dataset, to transform the data.
- Worked on both batch processing and streaming data Sources. Used Spark Streaming and Kafka for the streaming data processing.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes a batch of data to spark for real-time processing.
- Builtdatapipelines for reporting, alerting, anddatamining. Experienced with table design anddata management using HDFS, Hive, Impala, Sqoop, MySQL, and Kafka.
- Worked on Apache Nifi to automate the data movement between RDBMS and HDFS.
- Created shell scripts to handle various jobs like Map Reduce, Hive, Pig, Spark, etc., based on the requirement.
- Used Hive techniques like Bucketing, partitioning to create the tables.
- Developing ETL pipelines in and out of data warehouses using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
- Worked on AWS to aggregate clean files in Amazon S3 and on Amazon EC2 Clusters to deploy files into Buckets.
- Designed and architected solutions to load multipart files which can't rely on a scheduled run and must be event-driven, leveraging AWS SNS.
- Involvedin Data Modeling usingStar Schema, and Snowflake Schema.
- Used AWS EMR clusters for creating Hadoop and spark clusters. These clusters are used for submitting and executing Scala and Python applications in production.
- Responsible for developing a data pipeline with Amazon AWS to extract the data from weblogs and store it in HDFS.
- Migrated the data from AWS S3 to HDFS using Kafka.
- Integrating Kubernetes with the network, storage of security to provide comprehensive infrastructure, and orchestrating the Kubernetes containers across multiple hosts.
- Implementing Jenkins and building pipelines to drive all microservices builds out to Docker registry and deploying to Kubernetes.
- Worked with NoSQL databases like HBase, and Cassandra to retrieve and load the data for real-time processing using Rest API.
- Worked on creating data models for Cassandra from the existing Oracle data model.
- Responsible for transforming and loading large sets of structured, semi-structured, and unstructured data.
Environment: Hadoop 2.7.7, HDFS 2.7.7, Apache Hive 2.3, Apache Kafka 0.8.2.X, Apache Spark 2.3, Spark-SQL, Spark-Streaming, Zookeeper, Pig, Oozie, Java 8, Python3, S3, EMR, EC2, Redshift, Cassandra, Nifi, Talend, HBaseClient: Genems
Confidential
Hadoop Developer/Data Engineer
Responsibilities:
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Used Sqoop to import data from Relational Databases like MySQL, Oracle.
- Involved in importing structured and unstructured data into HDFS.
- Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
- Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data inHive tables.
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Working with open-source Apache Distribution then Hadoop admins must manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site. However, when working with popular Hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on startup and the Hadoop admin need not configure them manually.
- Experienced with Spark Context, Spark-SQL, Spark YARN.
- Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Developed traits andcase classes etc. in Scala.
- Developed Spark scripts using Scala shell commands as per the business requirement.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Experienced in loading the real-time data to a NoSQL database like Cassandra.
- Well versed in using Data Manipulations, Compactions, inCassandra.
- Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
- Well versed in using Elastic Load Balancer for Auto scaling in EC2 servers.
- Configured step functions workflows that involve Hadoop actions using Oozie.
- Used Python for pattern matching in build logs to format warnings and errors.
Environment: Hadoop, Map Reduce, Hive, Spark, Oracle, GitHub, Tableau, UNIX, Cloudera, Kafka, Sqoop, Scala, NIFI, HBase, Amazon EC2, S3. Python