We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

BentonvillE

SUMMARY

  • Dedicated Data Engineer 6+ years of experience in client/server presents distinctive competency in Business Analytics/ Business Intelligence/ Data Warehouse /Data Marts/Big Data/ Data Scientist/ Customer Relationship Management (CRM)/ Marketing Relationship Management (MRM)/ Sales & Marketing/ Supply Chain Management etc.
  • Around 5 years of Professional IT experience as Data Engineer/Data Analyst in building data pipelines using Big data Hadoop ecosystem, Spark, Hive, Sqoop Google cloud storage, Python, SQL, Tableau, GitHub and ETL tools.
  • Experienced in implementing teh end - to-end Business Intelligence enterprise solutions that include BI architecture, software development, deployment, infrastructure maintenance.
  • Expert in ETL transforming, data visualization from diverse business areas in novel and insightful ways and Ability to lead full lifecycle solutions, interface with clients.
  • Experienced in Database design, testing and Implementation.
  • Technical expertise in BI, DW, Data profiling, Data Cleansing, Data Integrity, Data Security.
  • Experienced in Data Modelling using Star/Snowflake Schema.
  • Experienced in performance tuning, Query Optimization and running Database
  • Experienced in multiple domains like Finance, Retail, E commerce and Healthcare.
  • Experience in using and writing SQL queries, database creation, and writing stored procedures, DDL, DML SQL queries.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing, and optimizing teh HiveQL queries.
  • Experience in ingestion, storage, querying, processing, and analysis of Big Data with hands-on experience in Big Data including Apache Spark, Spark SQL, Hive.
  • Used Spark, Google cloud storage to build scalable and fault-tolerant systems infrastructure to process TB’s data/day resulting in 15% increase in teh total number of users.
  • Experience in GCP services (Big Query/Bigtable, Dataproc, GCS, Data Flow, App engine and looker).
  • Experience in designing, developing, and deploying projects in GCP suite including GCP Suite such as BigQuery, Data Flow, Data proc, Google Cloud Storage, Composer, Looker etc..
  • Designed, tested, maintained teh data management and processing systems using spark, GCP, Hadoop and shell scripting.
  • Expertise in collecting, exploring, analyzing, and visualizing teh data by generating tableau/Looker reports/dashboards.
  • Worked with business users, product owners and engineers to design feature-based solutions, to implement them in agile fashion.

TECHNICAL SKILLS

Dash boarding/Reporting/Analytical Tools: PowerBI, QlikSense, Tableau, MicroStrategy

Languages: SQL, Python, R, Java

ETL Tools: Informatica, IBM Data stage

Database Technologies: Oracle 9i,10g, My SQL, SQL Server 2008/2012, Hive, Green plum, AWS, Sales Force, Sybase, Teradata and DB2

ERP: Oracle Applications eBusiness Suite (AP/AR/GL Modules)

Source Control Tools: TFS, XIRA, Microsoft Project Management

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Responsible to Build teh ETL Pipelines (Extract, Transform, Load) from data lake to different databases based on teh requirements.
  • Design and Develop Data application using HDFS, Hive, Spark, Scala, Sqoop, Atomic Scheduler, DB2, SQL Server and Teradata.
  • Development of base and consumption tables in datalake and moving from Datalake to Teradata.
  • Built teh catalog tables using batch processing, multiple complex joins to combine multiple dimension tables of teh store transactions and teh E-commerce transactions which TEMPhas millions of records every day.
  • Developed proof-of-concept prototype with faster iterations to develop and maintain design documentation, test cases, monitoring and performance evaluations using Git, Putty, Maven, Confluence, ETL, Automic, Zookeeper, Cluster Manager.
  • Used shell scripting to automate teh validations between different databases of each module, and report to teh users to show teh data quality, using frame works Aorta and Unified Data Pipeline.
  • Experience in developing testing and deploying code in YAML.
  • Responsible to Troubleshoot issues related to data pipeline failures or slowness, built using Map Reduce, Tez, hive or Spark to ensure SLA adherence.
  • Optimized hive scripts by reengineering teh DAG logic to use minimal resources and provide high throughput.
  • Worked with teh business users to find resolution and resolve discrepancies like error records and duplicate records across tables and Writing complex SQL/ HQL queries to validate teh reports.
  • Improving teh performance and optimizing existing algorithms in Hadoop using Spark context, Spark- SQL, Data Frames, Pair RDD’s & Spark YARN.
  • Worked on a migration project to migrate data from different sources (Teradata,Hadoop,DB2) to Google Cloud Platform(GCP) using UDP framework and transforming teh data using Spark Scala scripts.
  • Worked on creating data ingestion processes to maintain Global Data lake on teh GCP cloud and Big Query.
  • Built tableau dashboards to report store level sales and region level sales for Confidential US and global data.
  • Collaborated with senior management, subject matter experts, and IT teams to define business requirements.
  • Transformed business requirements into functional and nonfunctional requirements.
  • Created complex stored procedures to perform various tasks including, but not limited to index maintenance, data profiling, metadata searches, and loading of teh data mart.
  • Designed, developed, and tested teh ETL strategy to populate data from multiple source systems.
  • Experienced in developing Data Mapping, Performance Tuning and Identifying Bottlenecks of sources, mappings, targets and sessions.
  • Strong understanding of Data Modeling in data warehouse environment such as star schema and snowflake schema.

Environment: Hadoop, Spark, Scala, Teradata, Hive, Aorta, Sqoop, GCP, Google cloud storage, BigQuery, Dataproc, Dataflow, SQL, DB2, UDP, GitHub, Azure (Azure Data Factory, Azure Databases), Tableau, Lookers etc.

Confidential, Bentonville

Data Engineer

Responsibilities:

  • Developed complex transformations in Spark and Hive for calculating insurance claims and returns of teh policies for forecasting teh customer demand.
  • Collaborated with project managers and stake holders to gather business requirements via JRD sessions.
  • Developed T-SQL scripts for performing automation tasks, Performed and fine-tuned Stored Procedures and SQL Queries and User Defined Functions.
  • Improvise data ingestion techniques to provide insights for clients.
  • Lead a team of analysts for their PowerBI reporting and supervising teh development.
  • Building architecture diagrams for new clients, creating proposals and POCs for approval.
  • Currently working as an Architect on a new cloud implementation, responsible for building solutions using AzureDevOps for migration of their disparate RDBMS datasources using AWS DMS into Snowflake.
  • Responsible for leading teh team of developers in using teh framework and providing guidance/supervision on their implementations.
  • Primary contact for communicating with executive teams in providing team progress and updates.
  • Designed and automated teh pipeline to transfer teh data which provides centralized KPIs and reports for cross channel platform performance and insights to teh marketing team.
  • Designed Pyspark scripts and Airflow DAGS to transform and load teh data from HIVE tables from AWS S3
  • Designed, developed spark scripts for parsing teh JSON files and storing in Parquet file format in EMR.
  • Designed and developed automation for SPARK, UNIX and Python Scripts using Airflow DAGs.
  • Optimized and fine-tuned teh scripts to achieve efficient processing in AWS platform.
  • Automating and scheduling teh jobs through Oozie.

Environment: MapReduce, HDFS, Hive, Pig, Impala, Hue, Sqoop, Kafka, Oozie, YARN, Spark, Spark SQL (Data Frames and Dataset), Spark Streaming.

Confidential

SQL Developer

Responsibilities:

  • Involved in requirement gathering with different business teams to understand their analytical/reporting needs
  • Redesigned and improved existing procedures, triggers, UDF, and views with execution plans, SQL profiler, and DTA.
  • Worked with applications systems analysis and programming activities
  • Involved in writing SQL scripts, stored procedures and functions and debugging them and Scheduling SQL workflows to automate teh update processes
  • Defined and developed logical/physical dimensional models of teh data mart with ER-win
  • Wrote T-SQL stored procedures and complex SQL queries to acquire teh business logic.
  • Created SSIS packages to extract data from OLTP databases; consolidated and populated data into teh OLAP database.
  • Utilized various transformations such as lookup, derived column, conditional split, fuzzy lookup for data standardization.
  • Implemented auditing and error handling techniques in SSIS packages with loggings and event handlers.
  • Debugged SSIS packages with error log details, checkpoints, breakpoints, and data viewers.
  • Implemented package configurations with package and project level parameters.
  • Scheduled and maintain packages by daily, weekly, and monthly using SQL Server Agent in SSMS.
  • Created SSRS reports to produce in-depth business analyses in SSRS.
  • Developed ad-hoc, parameterized, drill-down, and drill-through reports with parameters in SSRS.
  • Created analytical dashboards to assist business users with critical KPIs and facilitating strategic planning with Tableau.
  • Created slicers in PowerBI for interactive reports and designs. It also provided filter options and implemented business logic at chart level using advanced set analysis and aggregation functions.

Environment: Microsoft SQL Server 2012, SSDT, SSMS, T-SQL, SSIS, SSRS, SSAS, ER-Win, SharePoint,PowerBI, TFS, DTA, Tableau, C#.Net, Visual Studio, CSS, HTML.

We'd love your feedback!