We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Over 8+ years of professional experience in IT, working with various Legacy Database systems, which include work experience in Big Data technologies as well.
  • Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies like Agile and hybrid methods.
  • Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase, Zookeeper, PIG, Sqoop, and Flume.
  • Knowledge and working experience on big data tools like Hadoop, GCP, Big Query, Azure Data Lake, AWS Redshift.
  • Good understanding of Apache Airflow.
  • Experience in workflow scheduling with Airflow, AWS Data Pipelines, Azure, SSIS, etc.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream - processing).
  • Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizations using SAS and Python.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Experience in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Solid Excellent experience in creating cloud based solutions and architecture using Amazon Web services (Amazon EC2, Amazon S3, Amazon RDS).
  • Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
  • Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of Big data.
  • Extensive working experience in agile environment using a CI/CD model.
  • Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.

TECHNICAL SKILLS

Big Data & Hadoop Ecosystem: Hadoop 3.0, Hive 2.3, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Cloudera Manager, Stream sets

Cloud Technologies: GCP, Big Query, AWS, EC2, EC3, Redshift & MS Azure, Snowflake

Data Modeling Tools: Erwin R9.7, ER Studio v16

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

Other Tools: VSS, SVN, CVS. Docker, CI/CD, Kubernetes

RDBMS / NoSQL Databases: Oracle 12c, Teradata R15, MS SQL Server 2019, Cassandra 3.11, HBase 1.2

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source Safe

Operating System: Windows 10/8, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6, SAP Business Objects XIR3.1/XIR2, Talend, Tableau

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Charlotte, NC

Sr. Data Engineer

Responsibilities:

  • As a Senior Data Engineer I’m responsible for building scalable distributed data solutions using Hadoop components.
  • Worked on Software Development Life Cycle (SDLC), testing methodologies, resource management and scheduling of tasks.
  • Imported data from RDBMS to HDFS and Hive using Sqoop on regular basis.
  • Designed complex data intensive reports in Power BI utilizing various graph features such as gauge, funnel.
  • Migrated on-premise Oracle ETL process to Azure Synapse Analytics.
  • Used Azure Data Lake as Source and pulled data using Azure PolyBase.
  • Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
  • Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.
  • Written POCs in Python to analyze the data quickly before applying big data solutions to process at a scale.
  • Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Used SQL Azure extensively for database needs in various applications.
  • Managed resources and scheduling across the cluster using Azure Kubernetes Service.
  • Designed ETL using Internal/External tables and store in parquet format for efficiency.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
  • Worked on writing APIs to load the processed data to Azure Sql tables.
  • Worked with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Integrated and automated data workloads to Snowflake Warehouse.
  • Developed custom Map Reduce programs for data analysis and data cleaning using pig Latin scripts.
  • Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
  • Developed Oozie workflows for scheduling and orchestrating the ETL process.
  • Worked with production team to resolve data issues in Production database of OLAP and OLTP systems.
  • Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
  • Worked on streaming data to consume data from KAFKA topics and load the data to landing area for reporting in near real time.
  • Developed and designed data integration and migration solutions in Azure.
  • Used GIT for version control and JIRA for project management, tracking issues and bugs.
  • Involved in writing python scripts to extract data from different API’s.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and PySpark concepts.
  • Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake and Storage.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Have done performance tuning by creating indexes, temporary tables, Partition functions, table variables.

Environment: Hadoop 3.0, Azure, POC, Python, Sqoop1.4, HDFS, ETL, HDFS, APIs, MapReduce, GIT, Hive2.3, Oozie4.3, Apache Airflow, KAFKA1.1, JIRA, SQL, PySpark, Spark SQL, No SQL.

Confidential - Atlanta, GA

Data Engineer

Responsibilities:

  • Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of business rules.
  • Data governance - ensures the availability and integrity of data.
  • Migrated the existing Spark jobs on to Azure Databricks.
  • Designed and developed ETL pipeline in Azure cloud which gets customer data from API and process it to Azure SQL DB.
  • Worked on on-prem Data warehouse migration to Azure Synapse using ADF.
  • Created pipelines, migrate the data from on-prem resources through the Data Lake and load the data into the Azure SQL Data warehouse.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Created Hive External tables to stage data and then move the data from Staging to main tables.
  • Ingested huge volume and variety of data from disparate source systems into Azure data Lake Gen2 using Azure Data Factory V2.
  • Involved in creating Azure Data Factory pipelines.
  • Created pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Actively involved in architecture of DevOps platform and cloud solutions.
  • Creation, scheduling and monitoring Azure Data Factory pipelines and Spark jobs on Azure Databricks.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Developed an automated process in Azure cloud which can ingest data daily from web service and load in to Azure SQL DB.
  • Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
  • Created custom alerts queries in Log Analytics and used Web hook actions to automate custom alerts.
  • Created Databricks Job workflows which extract data from SQL server and upload the files to SFTP using Pyspark and python.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Python scripts to do file validations in Databricks and automated the process using ADF.
  • Worked with enterprise Data Modeling team on creation of Logical models.
  • Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
  • Developed complex SQL queries using stored procedures, common table expressions (CTEs), temporary table to support Power BI reports.
  • Implemented complex business logic through T-SQL stored procedures, Functions, Views and advance query concepts.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data.

Environment: Hadoop 3.3, Spark 3.1, Scala 3.0, Kafka 3.0, JSON, Python 3.9, ADF V2, Azure Data Lake V2, Spark SQL, Power BI, T-SQL, Azure SQL DB, Sql Server, SDLC/Agile.

Confidential - Boston, MA

Sr. Data Analyst

Responsibilities:

  • As a Data Analyst role to review business requirement and compose source to target data mapping documents.
  • Interacted with Business Analyst, SMEs and other Data Engineers to understanding Business needs.
  • Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
  • Defined appropriate security roles related to data Define roles associated with securing, provisioning.
  • Audit security of the data within the domain of stewardship and define named individuals for each required role.
  • Worked closely with the business analyst and Data warehouse architect to understand the source data and need of the Warehouse.
  • Interacted with stakeholders on clearing their doubts regarding the reports in power BI.
  • Actively involved in SQL and Azure SQL DW code development using T-SQL
  • Involved in designing of star schema based data model with dimensions and facts.
  • Worked on a migration project which required gap analysis between legacy systems and new systems.
  • Involved in requirement gathering and database design and implementation of star-schema, snowflake schema/dimensional data warehouse using Erwin.
  • Performed and utilized necessary PL/SQL queries to analyze and validate the data.
  • Reviewed the Joint Requirement Documents (JRD) with the cross functional team to analyze the High Level Requirements.
  • Developed and maintained new data ingestion processes with Azure Data Factory.
  • Implemented data aggregation and business logic in Azure Data Lake.
  • Designed and developed automation test scripts using Python.
  • Created publishing reports for stakeholders using power BI.
  • Analyzed escalated incidences within the Azure SQL database.
  • Worked on the enhancing the data quality in the database.
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Involved in capturing data lineage, table and column data definitions, valid values and others necessary information in the data model.
  • Created or modified the T-SQL queries as per the business requirements.
  • Involved in user training sessions and assisting in UAT (User Acceptance Testing).
  • Participation in design and daily stand-up meetings.

Environment: Erwin, Azure Sql DB, Azure Data Lake, T-Sql, UAT, PL/SQL, Power BI, Python and Agile/Scrum.

Confidential - New York, NY

Data Analyst

Responsibilities:

  • Created and analyzed business requirements to compose functional and implementable technical data solutions.
  • Identified integration impact, data flows and data stewardship.
  • Involved in data analysis, data discrepancy reduction in the source and target schemas.
  • Conducted detailed analysis of the data issue, mapping data from source to target, design and data cleansing on the Data Warehouse
  • Created new data constraints and or leveraged existing constraints for reuse.
  • Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping documents, metadata, DDL and DML as required.
  • Anticipated JAD sessions as primary modeler in expanding existing DB and developing new ones.
  • Identified and analyzed source data coming from SQL server and flat files.
  • Evaluated and enhanced current data models to reflect business requirements.
  • Generated, wrote and run SQL script to implement the DB changes including table update, addition or update of indexes, creation of views and store procedures.
  • Consolidated and updated various data models through reverse and forward engineering.
  • Restructured Logical and physical data models to respond to changing business needs and to assured data integrity using PowerDesigner.
  • Created naming convention files and co-coordinated with DBAs to apply the data model changes.
  • Designed ETL specification documents to load the data in target using various transformations according to the business requirements.
  • Used Informatica- Power center for extracting, transforming and loading
  • Performed Data profiling, Validation and Integration.
  • Created materialized views to improve performance and tuned the database design.
  • Involved in Data migration and Data distribution testing.
  • Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
  • Performed testing, knowledge transfer and mentored other team members.

Environment: PowerDesigner, ETL, Informatica, JAD, SSRS, Sql Server, Sql & SDLC.

We'd love your feedback!