Sr. Data Engineer Resume
Englewood, CO
SUMMARY
- 8+ years of experience in Data Engineering using Big Data technologies which include Spark and Hadoop ecosystems.
- Databricks certified Associate Developer for Apache Spark 3.0
- Experience in building data pipelines using Azure stack dat includes Azure Data factory, Azure Databricks, Azure Data Lake, Azure Synapse.
- Experience in developing Spark applications in Databricks for data extraction, transformation, and aggregation from multiple file formats for analysing; transforming the data to uncover insights into the customer usage patterns.
- Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse.
- Worked wif the Microsoft Azure cloud environment & associated components and resources, to create enterprise - grade applications.
- Architect and implement ETL and data movement solutions using Azure Data Factory.
- Hands-on experience in working wif Continuous Integration and Deployment (CI/CD) using Jenkins, Docker.
- Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
- Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion
- Worked on troubleshooting failures in spark applications and fine-tuning for better performance.
- Good experience working wif various Hadoop distributions mainly Cloudera (CDH).
- Extensively used Spark Data Frames API over Cloudera platform to perform analytics on Hive data and used Spark Data Frame Operations to perform required Validations in the data.
- Practical knowledge in PL/SQL for creating stored procedures, clusters, packages, database triggers, exception handlers, cursors, cursor variables.
- Good Knowledge in Amazon Web Services (AWS) services like EC2, S3, EMR, DynamoDB, Redshift, Aurora.
- Collaborated wif different teams extensively to be part of an agile phase.
- Confident in interacting wif individuals at all levels, and client handling to accomplish the project.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Englewood, CO
Responsibilities:
- Implemented software upgrades to migrate old software systems to the Azure Cloud's Spark and Hadoop ecosystems.
- Experience in data extraction (Azure blob and Web services) and data ingestion for raw data to clean, process and to conduct trend and sentimental analysis.
- Involved in Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQLData warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
- Experience in public cloud based managed services for data warehousing/analytics in Microsoft Azure (Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure Table Storage, U-SQL, Stream Analytics, HDInsight, etc.)
- Experience in Azure Data Factory created data pipelines for feeding data into the CDM layer of Azure Data Lake.
- Implemented Azure Storage, Azure SQL, Azure Services and developing Azure Web role.
- Involved in Relational and Dimensional Data Modelling for ER Diagrams wif all linked entities and relationships wif each entity based on the rules supplied by the business manager using EiR Studio.
- Designed and implemented indexing strategies for MongoDB servers.
- Implemented Hive to examine the partitioned and bucketed data and compute different metrics for dashboard reporting.
- Managed and implemented ETL data pipelines using Python.
- Built Data Warehouse in Azure platform using Azure data bricks and data factory.
- Build Entity Relationship Diagrams (ERD), Functional Diagrams, Data Flow Diagrams, and Referential Integrity Constraints, as well as logical and physical models.
- Implemented Azure Databricks' Py-Spark API to conduct fast data transformations on a huge volume of data.
- Changed and established new Automated processes to pla hive queries, Hadoop commands, and shell scripts on daily basis.
- Evaluated current application scripts and adjust SQL queries utilizing the execution plan, query analyser, SQL Profiler, and database engine tuning adviser to improve performance.
- Implemented Azure Data Lake's RDS and SDM layers (Raw Data Structure and Structured Data Model), data acquired from A hold Delhaize Legacy systems is verified and structured.
- Implemented Spark context, Spark- SQL, Data Frames, Pair RDDs, and Spark YARN, improve the performance and optimize current Hadoop methods.
- Validated and modified current data flow jobs to resolve data-related problems.
- Planned, defined, and designed a data base using ER Studio based on business requirements and documentation. Responsible for migrating an on-premises application to the Azure cloud.
- Experienced in working on version control systems like Git, and used Source Code management tools like Github, GitLab and command line applications.
- Experienced in Agile Methodologies, Scrum stories, and sprint planning experience in a Python-based environment, along wif data analytics and Excel data extracts.
Environment: Hadoop, Azure Data Lake, Azure Data Factory, Azure Databricks, Azure Blob, Spark, Scala, Teradata, Hive, Aorta, Sqoop, SQL, DB2, UDP, GitHub.
Sr. Data Engineer
Confidential, Indianapolis, IN
Responsibilities:
- Work on requirements gathering, analysis and designing of the systems.
- Has used PySpark for data wrangling and Data Factory pipelines to transfer data from ADLS into SQLDB in order to build the business logic as per the requirement.
- Responsible for Configuring and monitoring job clusters on Databricks.
- Implemented Spark performance optimizations on most of the data pipelines following the best optimization principals.
- Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
- Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse.
- Worked on SQL queries in dimensional data warehouses and relational data warehouses. Performed Data Analysis and Data Profiling using Complex SQL queries on various systems.
- Experience managing Azure Data Lakes (ADLS) and an understanding of how to integrate wif other Azure Services like Azure Databricks.
- Develop workload migration plan in conjunction wif other technical teams
- Migrate data from traditional database systems to Azure databases.
Environment: Azure Databricks, Azure Data Lake Storage, MS SQL Server, Azure SQL Database, PySpark, Blob storage, Python, SQL.
Data Engineer
Confidential, Dania Beach, FL
Responsibilities:
- Understanding all the technical requirements in the project.
- Worked wif project management, business teams and departments to assess and refine requirements to design BI solutions using MS Azure.
- Played an important role in Data Wrangling using azure Databricks.
- Used Azure Databricks dbutils widgets to read in dynamic parameters while executing Databricks notebook
- Responsible for wrangling and sending the transformed data to ADLS and PBI.
- Implemented Spark performance optimizations on most of the data pipelines following the best optimization principals.
- Boosted the standstill spark job performance using salting mechanism.
- Collaborated wif different teams to be a part of an agile phase and participated in all sorts of agile ceremonies (user story grooming, sprint planning, sprint retrospective)
- Preparing Technical Document.
- Has configured mailing code to check for data validation wifout manual intervention using azure data bricks notebook.
- Assessed the SQL scripts and devised a PySpark-based solution.
- Responsible for Configuring and monitoring cluster jobs on Databricks.
- Extensively used Spark Data Frames API for data Cleansing, data transformation, data enrichment, data aggregation according to the requirement.
- Used GIT for version control and JIRA for project management, tracking issues and bugs.
Environment: Azure Databricks, ADF 2.0, Blob storage, ADLS, Databricks Notebook.
Big-Data / Hadoop Developer
Confidential
Responsibilities:
- Extensive experience in writing Pig scripts to transform raw data from several data sources in to forming baseline data
- Developed Hive scripts for end user / analyst requirements for Adhoc analysis
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Performed Near Real - time Analysis on clickstream data using Kafka and Spark for a POC project for Bloomingdales e-commerce division.
- Created Interactive reporting dashboards by combining multiple views in Tableau Dashboard.
- Designed and created various analytical reports and dashboards to halp business unit to identify critical KPIs and facilitate decision making and strategic planning in the unit
- Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries.
- Experience in using Sequence files, AVRO and HAR file formats
- Extracted the data from Teradata into HDFS using Sqoop.
- Has excellent hands on experience on Teradata utilities like MLOAD, FASTLOAD, TPUMP, FASTEXPORT, BTEQ and ARCHMAIN.
- Created Sqoop job wif incremental load to populate Hive External tables.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Good working knowledge of HBase.
- Involved in gathering business requirements and prepared detailed specifications dat follow project guidelines required to develop written programs.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Java 7, Eclipse, Oracle 10g, Tableau 9.X, Hadoop, MapReduce, Hive, HBase, Oozie, Linux, HDFS, Hive, CDH, SQL, Toad 9.6, Kafka, Spark and Scala.
Business Intelligence Developer
Confidential
Responsibilities:
- Created Reports, Dashboards and Storyboards in Tableau 9.0 and validated the loads from OLTP systems.
- Designed and developed dashboards for various business units like finance, marketing, operations and risk management using Tableau to analyse about five Terabytes of data each day.
- Created Data Quality Dashboards and did Application performance analysis and monitoring using Tableau.
- Created data extracts in Tableau by connecting to the view using Tableau MSSQL connector.
- Extensively used data joining and blending and other advanced features in Tableau on various data sources like Hive Tables, MySQL Tables and Flat files.
- Good experience wif configuration, adding users, managing licenses and data connections, scheduling tasks, embedding views on Tableau Server.
- Involved in Trouble Shooting, Performance tuning of reports and resolving issues wif in Tableau Server and Reports.
- Defined best practices for Tableau report development.
- Monitoring the system objects likes huge files/ unused indexes and taking necessary steps to improve the performance of applications as well as batch jobs Apache Hadoop installation & configuration of multiple nodes cluster using Cloudera Manager.
- Setup and optimize Standalone - System/Pseudo-Distributed/Distributed Clusters.
- Build/Tune/Maintain Hive QL and Pig Scripts for user reporting.
- Experienced in defining Oozie job flows.
- Experienced in managing and reviewing Hadoop log files.
- Developed and supported MapReduce Programs running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive.
- Involved in creating Hive tables, loading data, and writing Hive queries.
- Develop Shell scripts to automate routine DBA tasks (me.e. database refresh, backups, monitoring).
- Tuned/Modified SQL for batch and online processes.
Environment: CDH Hadoop (HDFS) multi-node installation, Tableau 8.X/9.X, Map Reduce, AWS, Hive, flume, Java, JDK, Flat Files, PL SQL, UNIX Shell Scripting.