Data Engineer Resume Needham, MA - Hire IT People

SUMMARY

Extensive IT experience around 7 years with multinational clients which includes of Big Data related architecture experience developing Spark/Hadoop applications.
Developed end to end pipelines using Airflow and databricks mounted notebook to perform ETL operations.
Used AWS S3, Redshift, spectrum, Athena for Business user reporting.
Developed shell script to schedule the jobs on airflow
Developed multiple notification applications and automatic alert mechanisms using python modules.
Using pandas, applied aggregations on various data sources and provided outputs in csv formats.
For arrays processing utilized numpy python modules to process.
Implemented POC to migrate python spark into spark data frames using python.
Experienced in working with Spark ecosystem using python modules pyspark, sparkQL and Scala queries on different data file formats like .txt, .csv etc.
Also, working towards improvement of knowledge on No - SQL databases like MongoDB.
Hands-on experience in scripting skills in Python, Linux, and UNIX Shell.
Experience in developing web-based applications.
Working with relative ease with different working strategies like Agile, Waterfall and Scrum methodologies.
Excellent communication and analytical skills and flexible to adapt to evolving technology.
Experience in visualizing infographics to deliver meaningful insights of data using Excel, Tableau and RShiny.
Experience in building Data pipelines, Data Engineering, Data Mining & programming Machine Learning Algorithms (supervised and unsupervised) to gather insights off the data.

TECHNICAL SKILLS

Programming skills: Python, R, C/C++, Java/Scala, Unix, Bash Scripting, pySpark, React, LaTex

Apache Technologies: Apache Spark

Cloud technologies: Databricks, AWS, Airflow, Docker

Big Data / Cloud technologies: Spark, Kafka, Redshift, Airflow, Kubernetes, Google Cloud Platform, AWS, Azure Devops, Hadoop, JIRA, CI/CD

Databases: PLSQL, Postgres, MS Azure, MS SQL Server 2017, SSIS, ERWin modeller, T-SQL, MySQL, Cassandra, HBase, DynamoDb

Analytical Skills: ETL, Data Warehousing, Informatica, Data Management, Collection, Predictive Models, data modeling, TensorFlow, Sparkml, a/b test, Data analysis, Redshift, Parquet

Business Intelligence: Tableau, SAS, Looker, Power BI, Cognos, Matplotlib, Seaborn, A/B testing, Looker, BigQuery, Alteryx, SSIS

Machine Learning: Logistic regression, random forests/decision trees, statistical models, neural net, svm, predictive analytics, Ensembles, NLP, Caffe, MxNet, Pytorch, Keras, RNN, attribution/forecasting, scikit-learn, SciPy, Matplotlib, Pandas

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential

Responsibilities:

Delivered Sales CDL team efforts to upgrade the pipeline from CDL 1.0 to CDL 3.0 in DEV, UAT and PROD environment, worked with Gitlab, CI/CD
Developed core component, Metric Engine, in iDNA platform, created comprehensive airflow UI to drive iDnA platform, removed operation’s pain points, increased the satisfaction of business clients
Orchestrated automation tools to speed up iDNA CDL process for patient domain
Led the efforts to standardize cluster params of iDNA platform, implemented new features to modify params with flexibility and adaptation
Implemented CDL ingestion pipelines for sales, multi channel marketing and Confidential mdh data, worked with different data formats such as zip, txt, GZ, bzip, csv from S3 to Redshift for Business users
Accelerated Data Validation process by reducing the manual work, made easier to debug code for migration by coding PySpark based automation script
Responsible for driving Sales Data Services weekly/monthly execution for various Airflow or Databricks issues like partition, concurrency.
Maintained quality reference data in RDS by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment.
Analyzed customer requirements and sales database and used SparkSQL to build ETL on Databricks for downstream systems.
Developed and written multiple pipeline rules in graylog to process different types data.
Developed big data applications using python modules like pyspark and other big data python modules
Designed and Developed Spark code using python, PySpark & Spark SQL for high speed data processing to meet critical business requirement.
Collected performance on existing classification process and reinvented the process using Spark 2.2 and Oozie
Worked on all four stages - data ingest, data transform, data tabulate and data export.
Maintained fully automated CI/CD pipelines for code deployment (Gitlab/ Jenkins).
Built code using Java, Spring boot, Maven, and Jenkins for building and automating our data workflow
Responsible for implementing Object-Oriented Programming concepts to build UI components that could be reused across the Web Applications and working on client-side frameworks like Spring frameworks and using version control tools like Git, GITHUB and iterative development tools like Atlassian Bitbucket and JIRA.

Tools: Used: HDFS, Spark, Spark SQL, Oozie, PySpark, Kafka, Hive, HBase, MapReduce, DatabricksAWS - Redshift, S3, EC2, EMR.

Data Engineer

Confidential | Needham, MA

Responsibilities:

Developed data pipeline using Flume, Spark, and Hive to ingest, transform and analyzing Data
Implemented various Pig UDF's for converting unstructured data into structured data.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Implemented Spark using python and utilizing Data frames and Spark SQL API for faster processing of data.
Developed custom web pages using Airflow boiler plate, HTML, javascript, flask to generate the line plot for all the DAGs to identify to optimization requirements for various tasks.
Built and maintained PL/SQL lifecycle code base to generate ad hoc and weekly financial reports for external and internal clients.
Created python tools using Pandas by automating content creation inside presentations, docs, reduce data validation cost to 0.
95 % speed improvement in Signal Spotting Trend Prediction, runtime from 500 min + to 20 min, back end using NumPy
Created ETL pipeline using python, data integrated with Tableau, creating custom reporting, generated dashboards.
Built regression model like Market Mix Modelling to estimate impact of marketing channels on sales, automate data preparation.
Analyzed Data to identify purchase KPI’s, use cases, storyteller dashboard, communicate insights, improve strategy, leadership
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Developed the Apache Spark, Flume, and HDFS integration project to do a real-time data Analysis
Using SQL to extract data from various client sources such as AWS S3, Redshift and aggregating them into Database.
Established system support for SQL Server, SQL query performance tuning for incoming data, reduced data ingestion by 25%
Designed, developed, and implemented data models with quality and integrity at the top of mind to support our products.
Load and transform large sets of structured, semi structured and unstructured data.
Performing analysis using high level languages like Python.
Launching Amazon EC2 cloud instances using Amazon images and configuring launched instances with respect to specific applications.

Tools: Used: Databricks, pyspark, Airflow, AWS, Python, Java Script, jQuery, R, Pandas, NumPy, SQL, D3. Js, GitLab, Hadoop, Pig, Sqoop, Oozie, MapReduce, HDFS, Hive, Java Eclipse, UNIX Shell Scripting

Data Engineer

Confidential

Responsibilities:

Executed web scraping using Python and building databases of audit and compliance of capital market companies
Composed codes in Python to web-scrape, and fed the data into Postgres Relational Database, created the top 100 artists from Facebook GRAPH API to scrape the posts, fans count, comments, likes of more than 2000+ artist fan pages
Spearheaded Toad Data Modeller tool to design relationship between various entities of artists in Relational Database
Applied SQL queries to identify various KPI of artists over the audience using Facebook as a tool
Collaborate Research on sentimental analysis on text using Natural Language Processing Tool kit to determine insights on opinion over artist influence, further SQL queries for time series analysis to identify trends, develop metrics
Implemented frontend website using HTML, CSS and hosted on linux server

Data Scientist

Confidential | Needham, MA

Responsibilities:

Established data pipelines into SQL Server for incoming data analysis using SQL and reduced time for data ingestion by 25%
Proposing solutions, debug, analyse A/B testing to increase efficiency of marketing campaigns, improve product sales by 25%.
Recommend innovative design policies by developing ETL pipeline using Talend, SSMS with Tableau, problem solving, reporting
Developed and executed Unit Test plans using JUnit, ensuring that results are documented and reviewed with Quality Assurance teams responsible for integrated testing.
Developed User Interface by using React, HTML5, Spring Web Flow, XHTML, DHTML and CSS3.
Involved in all phases of Software Development Life Cycle (SDLC) like Analysis, Designing, Developing, Testing, Finalizing.
Used Agile software development with Scrum methodology.
Worked on user validations by using Angular 2.0.
Implemented Web-Services to integrate between different applications (internal and third-party components using SOAP and RESTFUL services.
Performed Branching, Tagging, Release Activities on Version Control Tools: SVN, GitHub
Derive high quality industry trends, engagement Prediction Models using RF helped delivering retail Sales retention up by 12%
Build R-drake data driven pipelines for marketing models like Market Mix Modelling, automate data preparation, reporting.
Led analytics team for Marketing through dashboards, reporting, statistical analyses, capture insights from marketing campaign
Created an ETL pipeline using Talend, executed scrapping using Python from PCOAB, Nasdaq webs, Companyfinance, Auditors.
Perform requirements gathering, visualized in Tableau to find Metrics, integrity, insight of predictor healthcare

Tools: Used: Python, Tableau, SQL, Docker, AML H2O, AWS S3, TPOT, AWS EC2, R, MySQL, SQL

We provide IT Staff Augmentation Services!

Data Engineer Resume

Needham, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship