We provide IT Staff Augmentation Services!

Senior Data Engineer / Data Scientist Resume

0/5 (Submit Your Rating)

Northborough, MA

SUMMARY

  • Result oriented Cloud Data Engineer / Data Scientist with Stakeholder focus in niche and robust solutions for Cloud Migrations, Data Lake, Machine Learning Models, and BI applications across Airlines, Healthcare, Mining and Financial clients.
  • In depth technical and business knowledge from 5 years of professional progressive experience in the IT and Data Engineering (Big Data and Cloud) delivery consulting space
  • Built ground - up Data Lake / Delta Lake, Data warehousing and Machine Learning solutions leveraging various Frameworks (Hadoop, Spark, Docker, PyTorch, Tensorflow), Cloud services (Azure Data Factory, Databricks, CosmosDB, and others) and programming languages (SQL, Python, Shell Scripting, Java)
  • Designed and developed cloud migration solutions for migrating On-Premises databases and ETL using Azure migration tools.
  • Designed and developed cloud data ingestion and processing ETL solutions with various data sources and data syncs (Database, IOT Stream, HTTP, Blob Storage etc.), and various data formats (CSV, JSON, Parquet, Avro etc.)
  • Developed and optimized CI/CD DevOps pipelines using Azure DevOps, Jenkins, and other related tools like GitHub, GitLab, JIRA, Shell Scripts
  • Adept with Agile/Scrum, SDLC methodologies

TECHNICAL SKILLS

  • Python
  • SQL
  • Shell Scripting
  • Hive/Spark SQL
  • PySpark
  • Scala
  • Java
  • Big Data - Hadoop
  • Spark
  • Hive
  • NoSQL MongoDB
  • Presto
  • Snowflake etc.
  • Azure Cloud - Data Factory
  • Databricks
  • CosmosDB
  • Synapse Analytics
  • API Service
  • Blob Storage
  • ADLS etc.
  • GCP - Big Query
  • GCS
  • Vertex AI
  • Oracle
  • SQL Server
  • PostgreSQL
  • ETL - Azure Data Factory
  • Informatica
  • BODS
  • Power BI
  • Tableau
  • Git (GitHub
  • Gitlab)
  • JIRA
  • Confluence
  • CI/CD (Jenkins
  • Azure DevOps)

PROFESSIONAL EXPERIENCE

Confidential, Northborough, MA

Senior Data Engineer / Data Scientist

Responsibilities:

  • Developed Data Ingesting and Processing framework for ingesting CRM application sales data into Azure Data Lake / Delta Lake and feeding these datasets into downstream Datawarehouse for Data Analytics and Reporting and Machine learning models to predicting the future sales.
  • Designed and implemented ETL / Azure Databricks Notebooks using Spark Core, Spark SQL, Scala, Python modules to ingest and processing disparate data sets in different formats (Text, CSV, JSON, and Parquet) into a Data Lake (ADLS) hosted in Azure Cloud.
  • Orchestrated the ETL data pipelines using Azure Data Factory and Used Azure Cosmos DB, API Service to automate data loading into Tables/Views into downstream layer (Synapse Analytics).
  • Supported Data Cleansing and Feature Engineering stages to provide the right training, test, and evaluation datasets for Data science team to develop ML models using TensorFlow, PyTorch frameworks.
  • Conducted POC on Datarobot and Vertex AI to compare the tools that best applicable in scenarios. Work includes dataset dumping, model dumping, training, evaluation, testing and deploying to an end point.
  • Leveraged JIRA for Scrum, GitHub for Source Control, Azure Devops for (CI/CD), Confluence for Documentation.

Confidential

Senior Data Engineer

Responsibilities:

  • Developed ETL pipelines for ingestion and migration of insurance data into cloud Azure SQL DW which is a Data Warehouse used for analytics and reporting.
  • Built Data Flows in Azure Data Factory for sourcing, applying complex data transformations and writing to Azure SQL DW using the activities such as Source, Lookup, filters, aggregations, updates, pivots, Sinks, etc., and linked services such as ADLS, Azure SQL Database, Azure Blob Storage, Key Vaults, Azure SQL Data warehouse, API Service and more.
  • Connecting to different sources like Database (On-Premises), SharePoint and Blob. Independently managed development of ETL processes - Development to delivery. Expertise in JSON scripts for deploying pipeline in ADF.
  • Developed Pyspark applications using Spark (Core, SQL, MLlib) and Python libraries (Numpy, Pandas, Multiprocessing, and others) in Azure databricks.
  • Used Google Cloud for transforming data using Google Big query and Google cloud Storage.
  • Used Logic App to take decisional actions based on the workflow and developed custom alerts in Azure Data Factory to send email notifications using GraphAPI and to monitor the pipelines.
  • Automated the pipelines using scheduled event based tumbling window triggers in Azure Data Factory.
  • Involved in full lifecycle of projects that includes requirement gathering, system design, developments, enhancements, deployments, maintenance and support.

Confidential

Data Engineer

Responsibilities:

  • Engineered Data migration from Teradata (On-Premises) to Cloud Azure Data Lake (ADLS) for better scalability, performance and cost. Migrated Teradata ETL scripts (Bteq, Fastload, Multiload) to PySpark for data ingestion using Azure Databricks notebooks connecting to the multiples sources like SFTP, database, SharePoint and more.
  • Developed Incremental and Snapshot ETL pipelines using Azure Data Factory for ingesting the data into ADLS and then loading into Snowflake Data Warehouse. Also ingesting data into Palantir using manually developed PySpark scripts and orchestrated using Apache AirFlow
  • Performance tuned the Spark jobs by optimizing transformation, applying partitioning, right joins, broadcast variables and leveraging Azure Monitor service for measuring the DTU's, load on the SQL Databases etc.
  • Developed On-Premises ETL pipelines in SAP BODS for ingesting the data sourcing from text files, pdf, excel, Images into Cloudera Hadoop Hive Data Warehouse.
  • Built the data extraction code using Shell, SQL and Python scripting and automated them using Oracle control tables. Created the Hive tables (Hive QL) and views using the serde properties and partitioned them for use of reporting and analytics.
  • Developed Stream and batch Data Ingestion pipelines for IOT data, used for interacting (actions with devices), sending alerts/notifications (life of the device) and others.
  • Built spark streaming application in azure databricks connecting with Azure IOTHub and Event hub to fetch the data then insert into Azure SQL Server tables which triggers the Azure function apps for updating the data on user side.
  • Responsible for maintenance of the jobs created. In case of failure checked the Spark UI logs to identify the cause and rectify the error caused (Good exposure on spark architecture)
  • Created Power BI dashboard for reporting the data to the business on the usage of the resources and forecasted cost to maintain the application.
  • Followed the Devops flow for code versioning, Approvals and deployment into Higher environments.

Confidential

Java Full Stack Developer

Responsibilities:

  • It is a web-based application related to ERP and completely hosted on AWS Cloud.
  • Developed and deployed Programmatic Microservices REST API application written using Java Spring Boot framework on AWS Beanstalk on backend.
  • Created services, repositories, models, controllers design and development of a custom single page application by creating modules using AngularJS to get data from front end.
  • Migrated to store data in AWS Aurora PostgreSQL according to the flow of client requests.
  • AWS SNS service will be sent to UI and Automated Lambda Events gets triggered based on state of Laptop records.
  • Hands-on experience in implementation of TDD as a development practice.
  • Tested and fixed issues relate to REST API calls using Postman, API Swagger URLs

We'd love your feedback!