Senior Data Engineer Resume
Atlanta, GA
SUMMARY
- 7+ years of professional experience in IT as Data Engineer with an expert hand in the areas of Database Development, ETL development, Data Modeling, Report development and Big Data technologies.
- Experience in Microsoft Azure Cloud technologies including Azure Data Factory(ADF), Azure Data Lake Storage(ADLS), Azure Synapse Analytics(SQL Data warehouse), Azure SQL Database, Azure Analytical services, Polybase, Azure Cosmos NoSOLDB, Azure Key vaults, Azure Devops, Azure HDInsight BigData Technologies like Hadoop, Apache Spark and Azure Data bricks.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns.
- Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks
- Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse
- Expertise in building RDD transformations, Actions, Data Frames and case classes for the required input data, as well as converting RDD to Data frames using Spark Context.
- Hands on Bash scripting experience and building data pipelines on Unix/Linux systems.
- AVRO, Parquet, CSV, XML, JSON and Delta are among the file formats with which I’m familiar..
- Strong in Data Warehousing concepts, Star schema and Snowflake schema methodologies, understanding Business process/requirements
- On AWS, designed and built scalable, highly available, and fault-tolerant systems.
- Experienced in designing, built, and deploying and utilizing almost all the AWS stack (Including EC2, S3, EMR), focusing on high-availability, fault tolerance, and auto-scaling.
- Working knowledge with Tableau, including the creation of reports with the software.
- Expertise in Cloud Migration of Existing Applications and Data Feeds.
- Well Experience in projects using JIRA, Testing, Maven and Jenkins build tools.
- Experience in Work-flow management tools like Airflow, Databricks Workflows, Azure Data Factory.
- Involved in the entire software development life cycle (SDLC) for the application, which included Agile and Waterfall approaches.
- Excellent communication and interpersonal skills, as well as the ability to operate effectively in cross-functional team contexts.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Apache Spark, MapReduce, Kafka, Pig, Sqoop, Databricks Delta, Oozie
Languages: Python, Java, C++, C, SQL, Scala
Databases: MySQL, SQL Server, AWS Redshift, AWS DynamoDB, PostgreSQL, MongoDB, Apache HBase, Google Cloud BigQuery
Cloud: Microsoft Azure, AWS, Google Cloud, Snowflake
Cloud Stack: Microsoft Azure(Data Lake, Data Bricks, Data Storage, Data Factory), AWS(S3, EC2, EMR, Redshift, IAM, Cloud Watch, Quick Sight)
CI/CD and ETL Tools: Kubernetes, Docker, Jenkins, Informatica
Scheduling Tools: Airflow, Databricks Workflows, Azure Data Factory
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Senior Data Engineer
Responsibilities:
- Developed multiple optimized PySpark applications using Azure Databricks.
- Developed ETL solutions using SSIS, Azure Data Factory and Azure Data Bricks.
- Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
- Experience in building scalable data pipelines in Azure cloud platform using different tools.
- Converted Hive queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked complex SQL statements using joins, sub queries and correlated sub queries.
- Developed data pipelines using Azure Data Factory that process cosmos activity.
- Experience in creating HD-INSIGHT cluster and storage account for running the jobs.
- Migrated the data from Redshift data warehouse to Snowflake database.
- Developed spark scripts using scala and python shell commands
- Developed real time ingestion data pipelines from Event Hub into different tools.
- Experience in building ETL solutions using Hive and Spark with Python and Scala.
- Expert in working on optimizing applications built using tools like Spark and Hive.
- Developed job automations from different clusters using Airflow Scheduler.
- Involved in Monitoring set up for Hadoop production jobs using ELK.
Confidential
Data Engineer
Responsibilities:
- Gathered requirements from Users to Develop data pipelines from different sources to Hadoop.
- Maintaining and Designing Data governance and security for data platforms on AWS Cloud.
- Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Worked on Terraform to create resources in cloud.Built POC on AWS Glue.
- Experience in moving high and low volume data objects from Teradata and Hadoop to Snowflake.
- Developed reusable framework to be leveraged for future migration that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
- Created data bricks notebooks using Python(Pyspark),Scala and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones
- Performed day-to-day integration with the Database Administrators (DBA) DB2, SQL Server, Oracle, and AWS Cloud teams to ensure the insertion of database tables, columns and its metadata have been successfully implemented out to the DEV, QUAL and PROD region environments in AWS Cloud - Aurora and Snowflake.
- Deployed Instances, provisioned EC2, S3 bucket, Configured Security groups and Hadoop eco system for Cloudera in AWS.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Used SSIS, Python scripts, Spark Applications for ETL Operations to create data flow pipelines and was involved in transforming data from legacy tables to Hive, and S3 buckets for handoff to business and Data scientists to create analytics over the data.
- Expertise in migrating existing applications, Informatica Data feeds, and ETL Pipelines to the Hadoop, snowflake, and AWS.
- Created a process in Hadoop to load 1 billion records in span of one hour to match the latency and get the behind data for one of the important Dataset for Honeywell
Confidential
Data Engineer
Responsibilities:
- Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
- Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.
- Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
- Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Collaborate with application architects and DevOps.
- Storing the parquet data into hive data base with daily date partitions for further queries.
- Executed Oozie workflows to run multiple Hive and Pig jobs.
Environment: Hadoop, Hive, Spark 1.6, Scala 2.10, Sqoop, Oozie, Hive, AutoSys, HBase, PIG.
Confidential
Assistant System Engineer
Responsibilities:
- Analyzing the Business Requirements and System specifications to understand the application.
- Involved in preparing High Level Design and Low-Level Design Documents
- Involved in coding and unit testing the new codes.
- Prepare Test Plan and Test data.
- Testing the code changes at functional and system level
- Conduct quality review of design documents, code and test plans
- Ensure availability of document/code for review and conduct quality reviews of testing
- Fix problems discovered that are within the existing system functionality (Preventive Maintenance).
- Modifications required to the code to prevent problems from occurring in future (Preventive Maintenance).
- Involved in presenting induction to the new joiner’s in the project.
Environment: Java, Maven, UNIX, Eclipse, SOAP UI, WINSCP, Tomcat, JSP, Quality Center
