We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Over 15 years of professional experience in the field of Data Engineering, Data Analytics, Business Intelligence and Software Development using Open Source, Microsoft (Azure) and Amazon Webservice (AWS) technologies.
  • Industry standard expertise on both MPP & OLAP CUBE based large Data Warehousing Methodologies, ETL Processing, Fact and Dimensional Data Modeling
  • Successfully delivered Big Data & Business Intelligence solutions for Fortune 500 companies in Healthcare, Telecom, Fintech, Supply Chain, and other industry verticals.
  • Experience on both on premise and Could Data Integrations
  • Strong understanding on Mulesoft API - Led platform architecture
  • Hands on working capability with MuleSoft components, Mule Expression Language (MEL) workflow, Anypoint Studio, Enterprise Service Bus (ESB), API Manger and RAML, REST, SOAP
  • Exposure to MuleSoft Cloud-Hub and on-premise deployment models
  • Extensive knowledge on Amazon Redshift with Real-life implementation experience about Sort Key, Distribution Key, Diststyle and WLM (Workload Management)
  • Experience in Database Design, Complex T-SQL Scripts, TDD, In-Memory OLTP, Resource Governor, Indexing, Partitioning, Dynamic Data Masking of MS SQL Server / Azure SQL Database
  • In depth idea of SQL Server Query life cycle, Transaction Manager, Buffer Manager & Lazy Writer
  • Worked with SQL server Profiler/Extended Event for query optimization and deadlock detection
  • Practical experience in with PostgreSQL (MVCC, WAL, OOP, Foreign Data Wrapper) and MySQL & MariaDB
  • Demonstrable expertise in building Big Data Processing pipeline using Python, Spark (PySpark), Shell Script and handling of various data formats - Parquet, CSV, XML, JSON in both Linux and Windows platform.
  • Have knowledge in real time data processing using Kafka and Spark Streaming
  • Practical experience to design complex SSIS package and better analysis through SSAS in both Tabular and Multi-Dimensional model
  • Implemented CUBE, Analysis Service Model partitioning and dynamic cube refreshing using Analysis Service Management Objects (AMO) / Tabular Model Scripting Language (TMSL)
  • Professional expertise on complex ETL development and using Informatica
  • Leverages DevOps techniques and Experience with DevOps tools - GitHub, Jira, DocuSign, JAMA, Box with Agile methodology
  • Develop comprehensive dashboards with Tableau, SSRS, Microsoft Power BI, Business Objects, TIBCO JasperSoft and QlikSense, QlikView
  • Integrate Repots and Dashboard in web portal using HTML, CSS, Java Script (AngularJS)
  • Implemented Predictive Analytics/ Machine Learning using Azure Machine Learning (Azure ML) Studio, AWS ML & Python
  • Hands on working experience on Hadoop Ecosystem (Spark, HDFS, YARN, Zookeeper)
  • Excellent communication and interpersonal skills.

TECHNICAL SKILLS

DATABASE: SQL Server Azure SQL Database PostgreSQL MySQL

DATA WAREHOUSE: Redshift Azure Synapse SSAS (Tabular, Multi-Dimensional) SSIS

BIG DATA: Spark HDFS Python

ETL/API: MuleSoft Anypoint Studio Mule (ESB) Mule API Manager Mule Runtime Manager

DATA VISUALIZATION: SSRS Power BI JasperSoft QlikSense QlikView

MACHINE LEARNING: Azure ML AWS ML Jupyter Notebook

DEVOPS: JAMA JIRA DocuSign Box Bitbucket

OTHERS: Windows Linux Shell Script SQL Server Management Studio (SSMS)

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Design and Develop custom application for BIG Data ingestion platform using MuleSoft 4 for Clinical Trial industry by defining cloud data integration, API scope and specifications
  • Code to build software application to collect data from SQL Server, Flat File through complex ETL Process using, MuleSoft and stored in Azure SQL Data Warehouse, Amazon Redshift, Azure SQL Database
  • Professional expertise in building complex Mule Flows, Scopes, Error Handling, Message Filters, Validation, DataWeave, Payload Transformation, Scatter Gather, Flow Reference, Message Enricher and Flow Controls using Anypoint Studio.
  • Used Anypoint Runtime Manager for deployment and scheduling. Developed reusable flow for complex data processing
  • Create REST API to expose data for sponsors using, MuleSoft, C# and ASP.Net, WebApi, HTML, CSS, Java Script
  • Building data pipelines and complex ETL to process external client data using Python, Spark
  • Build Dashboards using QlickView, QlikSense and integrate them into the portal with (Mashup and Dev-Hub)
  • Built interactive and dynamic action trigger platform using Python, Azure ML to notify trial site for potential abnormal behavior
  • Developed Machine Learning Model through Azure Machine Learning Service to identify issues for possible incorrect subject assessment in Clinical trial
  • Develop Measures and KPI in deploy Azure Analysis Service. Refresh Azure SSAS Models through REST API call using MuleSoft
  • Design unit test plan and test script to ensure data correctness in Data Warehouse and ETL Application
  • Manage and maintain optimize SQL Server, PostgreSQL in Windows, and Linux environment

Environment: SQL Server, Azure SQL Data Warehouse, Azure Analysis Service, Azure BLOB, MuleSoft, QlickView, QlikSense, Azure Machine Learning Studio, Python, C#, Spark, Linux, Jupyter Notebook, SQL Server Management Studio (SSMS), Anypoint Studio, Anypoint Runtime Manager, Anypoint API Manager, DataWeave

Confidential

Sr. Database Engineer

Responsibilities:

  • Create and maintain optimal distributed data pipeline architecture
  • Develop robust Changed Data Capture (CDC) solution using Kafka, Debezium, Shell Script
  • Consume CDC data using Spark and saved as Parquet in HDFS for later analysis
  • Process & Transform Parquet to build Data Warehouse
  • Implement visualization tools like Metabase, Tableau that utilize the data pipeline to provide actionable insights and key business performance metrics.
  • Administer, optimize, and ensure availability of PostgreSQL and MariaDB

Environment: Kafka, PostgreSQL, MariaDB, Debezium, Spark, PySpark, Python, Metabase, HDFS, Tableau, Shell Script, Linux

Confidential

Principal Data Engineer

Responsibilities:

  • Building AWS cloud-based data ingestion platform that allows data collection, transformation and loading from 5 different data sources using SSIS, MuleSoft and ingest them into Amazon Redshift
  • Troubleshooting Mule ESB, including working with debuggers, flow analyzers and configuration tools
  • Routing and orchestration in MuleSoft payload of JSON and XML message
  • Implement the development of Mule services to connect both internal and Cloud SOAP and REST web services.
  • Designed & developed robust data warehouse architecture and Visualization in Power BI and TIBCO JasperSoft
  • Manage & Maintained production Databases mostly in SQL Server, Implemented TDE, High Availability, Backup Recovery, Partitioning Large tables in existing production databases
  • Design interactive dashboards with Network Signal Monitoring starting from high-level Google Map overview to street view,
  • Built and Design ML Model using AWS Machine Learning Service

Environment: SQL Server, SSIS, Microsoft PowerBI, TIBCO JasperSoft, Amazon Redshift, AWS Machine Learning, MuleSoft, Mule ESB, SOAP, REST

We'd love your feedback!