We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Over 5 years’ experience as Data Engineer in HDFS, HiveQL, Spark, SQL, python modules different tools in Hadoop Eco system, building end to end ETL pipelines and experience in Master Data Management (MDM) in various business domains including Banking and Information Technology.
  • Experience in Application Development using various Bigdata technologies like Apache SPARK, Python, PySpark, Scala, Hive, Kafka streaming, UNIX, Autosys, ADO, GitHub.
  • Expertise in creating Data models (views and tables) using Wherescape Red and 3D, created Hubs Satellite.
  • Extensive use of Wherescape 3D for designing the data model and managing data warehouses using all industry standards.
  • Created relationship between Hub tables, Link tables and Satellite tables using Wherescape 3D using data vault design.
  • Good Knowledge and working experience with AWS technologies like s3, EMR and Elastic Search.
  • Worked on developing real - time Spark SQL scripts in Scala and Python (PySpark) language.
  • Worked on consuming Kafka messages through Spark Streaming.
  • Extensive working experience with different data formats of csv, Json, Orc, Excel, Json, Parquet, and un-structured files reading through Spark with Scala and PySpark.
  • Developed a standard ETL framework to enable the reusability of similar logic across the board. Involved in System Documentation of Dataflow and methodology.
  • Extensively used high level backendPL/SQLprogramming involving different databases.
  • Extensively used Explain Plan, SQL Optimizers, Partitioning Tables, gather Statistics to analyze and improve the performance of the Queries.
  • Proficiency in understanding client’s needs and generate, modify reports existing in Oracle Applications according to the requirements and specifications.
  • Experience in writing T-SQL (DDL, DML AND DCL), developing/creating new database objects such as tables, views, indexes, complex stored procedures, function (UDF), cursors, and triggers, locking issues, BCP, common table expressions (CTEs).
  • Involved in installation ofPower BI Report Server.
  • Developed various solution driven views and dashboards by developing different chart types includingPie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, Scatter Plots in Power BI.
  • Designed and developedPower BIgraphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.
  • Worked in creating complex stored procedures,SSIS packages, triggers, cursors, tables, and viewsand other SQL joins and statements for applications.
  • Good Knowledge/Understanding of NoSQL data bases and hands on work experience in writing applications onNoSQLdatabases likeCassandraandMongoDB.

TECHNICAL SKILLS

Operating Systems: Unix, Linux, Windows

Programming Languages: Python, Scala, SQL,Java

Visualization Tools: Tableau, PowerBI, GraphQL

Python modules: Pyspark, Pandas, NumPy, Scikit, caffe, Pylearn, Pyevolve, Pytorch, Pattern, Keras, OpenCV, PySpark, PyCUDA, Hadoopy, Pydoop, TensorFlow, boto3, Django, Matplotlib, GGplot, High charter, Bokeh, GraphX

SQL modules: Psql, PostgresSql, Dockers, Sql WorkbenchSql developer

Scripting: JavaScript, Angular JS, jQuery, NodeJS

Data Formats: XML, CSS, JSON, Avro, Parquet, ORC

Databases: Oracle 11g, DB2, MS SQL server, MongoDB, SQL, Cassandra, Couchbase, Graph db., Splunk

Cloud services: AWS (EMR, EC2, S3, RDS, Redshift, Snowflake), Azure Microsoft Azure (Data Lake, Data Storage, Databricks, Azure Data Factory), Google cloud

Build & Design Tools: Anaconda, Spyder, WEKA, Jupyter, GitHub, Jira, RStudio, Concourse, Wherescape 3D, RED

ETL Tools: Talend, Apache Airflow, Kubeflow, Wherescape

IDE: Eclipse, IntelliJ

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential

Responsibilities:

  • Responsible for designing and developing enterprise-level ETL architecture solutions utilizing Data Vault 2.0 concepts following Agile and Scrum framework best practices.
  • Responsible for implementing and managing release process for code through development, tested production environment using Team Foundation Server as Release Management tools.
  • Involved in creating pipelines reading from Data Vault 2.0 and applying some business transformations and load it into Wherescape 3D path and eventually to Wherescape RED.
  • Created Data models (views and tables) using Wherescape 3D and RED, created Hubs Satellite.
  • Used Wherescape 3D for designing the data model and managed the data flow using all industry standards.
  • Created relationship between Hub tables, Link tables and Satellite tables using Wherescape 3D using data vault design.
  • Designed/ Architected new solutions for new products that the Product Owners decide to bring in and re-architect or enhance existing solutions.
  • Designed and developed an API to transfer file to sustainability teams to capture the carbon content. As a part of the effort developed a complex SQL query to pull the data and using Wherescape RED application saved it as csv in 3D location and made a REST call to the API to transfer over the files.
  • Created complex SQL scripts and SnowSQL (snowflake) used Explode, lateral view and some aggregate operators.
  • Created complex Stored Procedures, Triggers, Cursors, Tables Views, and Other T-Sql Queries.
  • Developed scripts to move data (~ 100M records) using API call to external Sources for carbon emissions reporting.

Data Analyst

Confidential

Responsibilities:

  • Developed MapReduce programs running on Yarn using Java to perform various ETL, cleaning and scrubbing tasks.
  • Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
  • Used Informatica Designer to create complex mappings using different transformationslike Filter, Router, Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Expressions and Aggregator transformationsto pipeline data to DataMart.
  • Developed a standard ETL framework to enable the reusability of similar logic across the board. Involved in System Documentation of Dataflow and methodology.
  • Proficiency in understanding client’s needs and generate, modify reports existing in Oracle Applications according to the requirements and specifications.
  • Experience in writing T-SQL (DDL, DML AND DCL), developing/creating new database objects such as tables, views, indexes, complex stored procedures, function (UDF), cursors, and triggers, locking issues, BCP, common table expressions (CTEs).
  • Involved in installation ofPower BI Report Server.
  • Developed various solution driven views and dashboards by developing different chart types includingPie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, Scatter Plots in Power BI.
  • Created reports utilizingSSRS, Excel services, Power BIand deployed them onSharePoint Serveras per business requirements.
  • Generated ad-hoc reports inExcel Power Pivotand shared them usingPower BIto the decision makers for strategic planning.
  • Designed and developedPower BIgraphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.

Software Engineer

Confidential

Responsibilities:

  • Closely worked with Business Analysts and Project leads to understanding business data flow and in gathering requirements and prepared technical design documents for implementing solutions using ETL and data visualizations using Tableau.
  • Collaborated with cross-functional IT teams to transform various data sources into meaningful visual storyboards.
  • Mapped business requirements to functional and technical specifications and delivered reports.
  • Performed data profiling, data migration, data validation, batch data integration, pre-landing processing, and custom data publish.
  • Identified data-related inconsistencies, gaps and communicated with Data Engineering and Business Teams.
  • Created Bar Charts, Area Charts, Maps, Scatterplots, Pie Charts, Tree Maps, Dual Axis Charts, Table Calculations, Calculated fields
  • Used tableau features Aggregation, Granularity, level of detail, data blending, filters, actions, and Hierarchies.
  • Worked on creating Tableau data extracts
  • Developed Tableau dashboards and reports about technology talent and workforce strategy.
  • Extensively worked with ETL processes for data cleansing.
  • Worked on semi-structured data and data preparation using SQL and Informatica ETL for feeding data input for Tableau Dashboards.

We'd love your feedback!