We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Rhode, IslanD

SUMMARY:

  • Results driven Data Engineer, with an overall experience of 3.5 years in Big Data Engineering, designing, developing and maintaining application using Scala, Python, R programming, MapReduce, Spark and other Hadoop Components.
  • Experience in data visualization, data analytics, data integration, data quality using Python.
  • Strategic thinking to develop long term solutions and can adapt quickly to the issue and work with various teams towards driving a cohesive solution.
  • In depth knowledge on Big Data Stack like YARN, Sqoop, Flume, Kafka, Spark, Spark Data Frame, Spark SQL, Spark streaming etc.
  • Exploring with Spark using Scala improving the performance and optimization of the existing algorithms in Hadoop.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, statistical modeling dimensionality reduction using Principal Component.
  • Good knowledge in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
  • Worked with data scientists on deriving right set of features for predicting the final outcome of a Machine Learning model.
  • Machine Learning
  • AWS
  • Data Mining
  • Azure
  • Data Visualization
  • Hadoop
  • Python
  • Spark
  • Sqoop
  • SQL

PROFESSIONAL EXPERIENCE:

Confidential, Rhode Island

Data Engineer

Responsibilities:

  • Built an application from scratch to find out various attributes at the Point of Sale of a transaction in a pharmaceutical store using Pyspark. The insights derived from this application helped business save approximately 10M$ for FY18 and will save them 40M$ for FY19.
  • Designed and implemented a Machine Learning framework that would send out the right offer to the customer at the right time through right channel.
  • Responsible for migration of application running on premise onto Azure cloud.
  • Implemented an application for cleansing and processing terabytes of data using Python and Spark.
  • Created data mappings based on the business requirements and translated them to technical solutions.
  • Developed complex HQL scripts to transform and process the data per business requirements using multiple staging tables, joins and complex hive analytic and windowing functions.
  • Identified potential bottlenecks in queries and tuned the queries for better performance and to produce desired results.
  • Analyzing existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprisedatalake built on Azure Cloud.
  • Responsible for debugging any applicative/cluster issues and analyzing root cause for the issues.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s and Scala.

Environment: Microsoft Azure, Spark, Python, SQL, Hortonworks, HDFS, YARN, MapReduce, Shell Scripting, Teradata, Airflow, Gitlab, PyCharm, Pandas and NumPy.

Confidential, Dallas, Texas

Data Engineer

Responsibilities:

  • Collaborated with Cross functional departments to collect Customer intelligence and Designed A/B experiments formulate product innovation strategies. Evaluated customer behavioral patterns to enhance Client value by 25%.
  • Created a data pipeline that calculates how many visitors have visited the site each day.
  • Each pipeline component feeds data into another component. The outputs are used for different analysis.
  • Implement software enhancements to port legacy software systems to Spark and Hadoop ecosystems on Azure Cloud.
  • Forecasted Customer equity for Firm by building a Probabilistic model in Python based on metrics such as Customer Lifetime, Transaction rate, monetary value at customer level to project best Customers for franchises.
  • Loaded the data using Sqoop from different RDBMS Servers like Oracle and MySQL to Hadoop HDFS Cluster.
  • Design and Develop Spark jobs as per requirements for data processing needs.
  • Used Hadoop FS actions to move the data from upstream location to local datalocations. Developed and designed automate process using shell scripting for datamovement.

Environment: Hadoop, MapReduce, Scala, Spark, Tableau, AWS, ML lib, Airflow, Python, GitHub, Power BI, ETL, Linux

Confidential

SQL Developer

Responsibilities:

  • Designed, created, and implemented database applications based on business requirements.
  • Created complex queries to calculate the metrics which determined the overall performance of company’s business.
  • Performed archiving of legacy data into a remote database and increased storage space by nearly 20%. Archiving also led to an enhanced performance.
  • Written complex functions to support application development.
  • Created stored procedures to extract the data from source table, transformed it as per the business logic and loaded into the target table.
  • Came up with complex views and tuned overall performance of the application.
  • Managed data sizing keeping future growth in mind.
  • Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads.
  • Improved query performance by following performance enhancement tips and database best practices.
  • Debugged the code and created exceptions to handle bad data and sent them to error log files.
  • Documented and performed unit testing for all components prior to migrating them to production.

Technical Stack: SQL Server, No SQL, PostgreSQL, Shell Scripting, SSIS, TSQL, Data Analysis, Data Modeling, Data Warehousing.

We'd love your feedback!