We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

5.00/5 (Submit Your Rating)

Rochester, MinnesotA

SUMMARY

  • 9 years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in Big Data, Hadoop Development and Ecosystem Analytics, Data Warehousing, ETL, Business Intelligence, Development and Design of Java based enterprise applications.
  • Experiences in Hadoop, Eco - system components HDFS, MapReduce (MRV1, YARN),, Hive, HBase, Kafka, and Programming in Spark using Scala and exposure to Cassandra
  • Good Knowledge and hands on experience in Amazon Web Service (AWS) concepts like EMR, Redshift, Athena, SNS, SQS, Step functions, Lambda, Kinesis, Glue and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc.
  • Extensive experience in Extraction, Transformation, and Loading (ETL) data from various data sources into Data Warehouse and Data Marts using Informatica PowerCenter tools (Repository Manager, Designer, Workflow Manager and Workflow Monitor)
  • Experience in using Data bricks for handling all analytical process from ETL to all data modeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs.
  • Proficient in Hive Query language and experienced in hive performance optimization using Static-Partitioning, Dynamic-Partitioning, Bucketing and Parallel Execution concepts.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib and expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core.
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Bill-Inman Approach.
  • Strong Data modeling experience using ER diagram, Dimensional data modeling, Conceptual/ Logical/ Physical Modeling using 3NormalForm (3NF), Star Schema modeling, Snowflake modeling using tools like Erwin and ER-Studio.
  • Experience in Apache Airflow to author workflows as directed acyclic graphs (DAGs), to visualize batch and real - time data pipelines running in production, monitor progress, and troubleshoot issues when needed.
  • Good knowledge of Hadoop Architecture and various components such as YARN, HDFS, Node Manager, Resource Manager, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Solid SQL skills can write complex SQL queries, functions, triggers and stored procedures for Backend testing, Database Testing and End-to-End testing.
  • Strong experience in CI (Continuous Integration)/ CD (Continuous Deployment) software development pipeline stages like Commit, Build, Automated Tests, and Deploy using Pipelines in Jenkins, Ansible, Docker and Kubernetes.
  • Experienced on Hadoop cluster on Azure HD Insight Platform and deployed Data analytic solutions using tools like Spark and BI reporting tools.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies and Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
  • Expertise in using Kafka as a messaging system to implement real-time Streaming solutions and experienced in moving data from different sources using Kafka producers, consumers and preprocess data.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.

PROFESSIONAL EXPERIENCE

Senior Data Engineer

Confidential

Responsibilities:

  • Implemented CARS (Customer Anti-Money Laundering Risk Scoring) and Transaction Monitoring (TM) Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow.
  • Provided ML Data Engineer expertise in Negative News model enhancement with diverse data provided by LexisNexis and other international vendors. For Data ingestion: AWS (EMR, Kinesis Streams & Firehose, RDS, DynamoDB), Spark Streaming; For Data Prep: Python Web Scrapping, PyPDF2, Spark Natural Language Processing (Tokenizer, Stop Words Remover, Count Vectorizer, Inverse Document Frequency, String Indexer), AWS Glue (ETL), IBM DataStage.
  • Designed and developed data cleansing, data validation, load processes ETL using Oracle SQL and PL/SQL and UNIX.
  • Doing ETL jobs with Hadoop technologies and tools like Hive, Sqoop and Oozie to extract records from different databases into the HDFS.
  • Worked on Data Lake in AWS S3, Copy Data to Redshift, Custom
  • SQLs to implement business Logic using Unix and Python Script Orchestration for Analytics Solutions.
  • Installation of NoSQL MongoDB on physical machines, Virtual machines as well as AWS
  • Support and management of NoSQL database Install, configure, administer, and support multiple NoSQL instances Perform database maintenance and troubleshooting.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Ingested wide variety of structured, unstructured, and semi structured data into RDBMS (feasibility conditioned according to architecture) as well as into AWS Data echo systems with batch processing and real time streaming.
  • Worked on Airflow 1. Airflow operators and orchestration of workflows with dependencies involving multi-clouds.
  • Designed Stacks using Amazon Cloud Formation templates to launch AWS Infrastructure and resources. Developed AWS CloudFormation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups.
  • Worked on creating server-less Micro services by integrating AWS Lambda, S3/Cloud watch/API Gateway.
  • Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub and AWS AMI's, whenever a new GitHub branch gets started, Jenkins, Continuous Integration server, automatically attempts to build a new Docker container from it.
  • Architected analytical data pipeline including but not limited to stakeholders’ interviews, data profiling, and extraction process designing from diverse sources, and data load optimization strategies.
  • Created DWH, Databases, Schemas, Tables, write SQL queries against Snowflake.
  • Develop stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.
  • Defined virtual warehouse sizing for Snowflake for different type of workloads.
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Monitoring Resources and Applications using AWS Cloud Watch, including creating alarms to monitor metrics such as EBS, EC2, ELB, RDS, S3, and configured notifications for the alarms generated based on events defined.
  • Worked with an in-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including EC2 and S3.
  • Extensively used Kubernetes which is possible to handle all the online and batch workloads required to feed, analytics, and machine learning applications.
  • Consulting on Snowflake Data Platform Solution Architecture, Design, Development and deployment focused to bring the data driven culture across the enterprises.
  • Streaming solutions and experienced in moving data from different sources using Kafka producers, consumers and preprocess data and Kibana for Elastic Search.
  • Have worked with Google services like Cloud Storage, Big Query, GKE and DataStudio.

SR. Data Engineer

Confidential, Rochester, Minnesota

Responsibilities:

  • Interaction with direct Business Users and Data Architect for changes to Data Warehouse design in on-going basis.
  • Involved in Data modeling and design of data warehouse in star schema methodology with conformed and granular dimensions and FACT tables.
  • Have worked with Protected Health Information (PHI) and HIPAA regulations to better secure patient’s private data.
  • Used Spark Data Frames to create various Datasets and applied business transformations and data cleansing operations using Data Bricks Notebooks.
  • Efficient in writing Python scripts to build ETL pipeline and Directed Acyclic Graph (DAG) workflows using Apache E, Apache NiFi.
  • Optimized the PySpark jobs to run on Kubernetes Cluster for faster data processing.
  • Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, PySpark.
  • Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.
  • Ingested data in mini-batches and performs RDD transformations on mini-batches of data by using Spark Streaming to perform streaming analytics in Databricks.
  • Worked on Azure Data Factory to integrate data of both on-prem (MySQL, PostgreSQL, Cassandra) and cloud (Blob Storage, Azure SQL DB) and applied transformations to load back to Azure Synapse.
  • Created pipelines in ADF using linked services to extract, transform and load data from multiple sources like Azure SQL, Blob storage and Azure SQL Data warehouse.
  • Primarily involved in Data Migration process using SQL, Azure SQL, SQL Azure DW, Azure storage and Azure Data Factory (ADF) for Azure Subscribers and Customers.
  • Implemented Custom Azure Data Factory (ADF) pipeline Activities and SCOPE scripts.
  • Created Spark clusters and configured high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data.
  • Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, SQL Azure Instances, SQL Azure DW instances, HD Insight clusters and installing DMGs on VMs to connect to on premise servers.
  • Responsible for ingesting data from various source systems (RDBMS, Flat files, Big Data) into Azure (Blob Storage) using framework model.
  • Involve into Application Design and Data Architecture using Cloud and Big Data solutions on AWS, Microsoft Azure.
  • Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution.
  • Worked on building the data pipeline using Azure Service like Data Factory to load the data from Legacy SQL server to Azure Data Base using Data Factories, API Gateway Services, SSIS Packages, Talend Jobs and Python codes.
  • Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data and load into Azure Data Warehouse using Azure Web Job and Functions.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (Azure SQL DB).
  • Extracted Tables and exported data from Teradata through Sqoop and placed them in Cassandra.

Environment: Microsoft SQL Server products (SSIS, SSRS), Python Anaconda, Azure, Dynamo DB, PostgreSQL. Azure Data Factory, Azure SQL, Azure Databricks, Azure

SR. Data Modeler/ Data Analyst/Engineer

Confidential, St Louis, Missouri

Responsibilities:

  • Gather and analyze business data requirements and model these needs. In doing so, work closely with the users of the information, the application developers and architects, to ensure the information models are capable of meeting their needs.
  • Designed ETL specifications with transformation rules using ETL best practices for good performance, maintainability of the code and efficient restart ability.
  • Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using Erwin.
  • Design MOLAP/ROLAP cubes on Teradata Database using SSAS and used SQL for Querying the database in UNIX environment and creation of BTEQ, Fast export, Multi Load, TPump, Fast load scripts for extracting data from various production systems
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs and developed Oozie workflow jobs to execute HIVE and MapReduce actions.
  • Developed mapping spreadsheets for (ETL) team with source to target data mapping with physical naming standards, data types, volumetric, domain definitions, and corporate meta-data definitions.
  • Used CA Erwin Data Modeler (Erwin) for Data Modeling (data requirements analysis, database design etc.) of custom developed information systems, including databases of transactional systems and data marts.
  • Developing containment scripts for data reconciliation using SQL, Python and hive and successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
  • Worked on data integration and workflow application on SSIS platform and responsible for testing all new and existing ETL data warehouse components and working along with ETL team for documentation of transformation rules for data migration from OLTP to warehouse for purpose of reporting.
  • Generated various reports using SQL Server Report Services (SSRS) for business analysts and the management team and wrote and running SQL, BI and other reports, analyzing data, creating metrics/dashboards/pivots/etc.
  • Involved in writing T-SQL working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.

Environment: SQL Server, Oracle, AWS EC2, Python, Hadoop, Hive, HDFS, AWS RDS, XML, RDS, NOSQL, Spark, Scala, Python, MySQL, PostgreSQL, SSRS, SSIS, SQL, DB2, Scripting, Tableau

Data Warehouse Analyst

Confidential

Responsibilities:

  • Understand the business process; gather business requirements; determine impact analysis based on ERP and Created logical physical data models and Meta Data to support the requirements Analyzed requirements to develop design concept and technical approaches to find the business requirements by verifying Manual Reports.
  • Involved in fixing invalid mappings, testing of Stored Procedures and Functions, Unit and Integrating testing of Informatica Sessions, Batches, and the Target Data.
  • Worked on data integration and workflow application on SSIS platform and responsible for testing all new and existing ETL data warehouse components.
  • End to End process involvement from gathering client business requirements, developing the dashboard in Tableau and publishing the dashboard into server.
  • Provided Support for Fixed Income application and batch processes running on UNIX servers.
  • Installation, Configuration and Maintenance of VERITAS cluster server VCS for UNIX boxes
  • Extracted/Transformed/Loaded (ETL) design and implementation in areas related to Teradata utilities such as Fast Export and MLOAD for handling numerous tasks.
  • Implement functional requirements using Base/SAS, SAS/Macros, SAS/QL, UNIX, Oracle and Coding SAS programs with the use of Base SAS and SAS/Macros for Adhoc jobs requested by Users and DB2 and Upgrading the SQL Server Databases, Monitoring and Performances tuning and developed reports using Crystal Reports with T-SQL, MS Excel and Access.
  • Involved in several facets of MDM implementations including Data Profiling, Metadata acquisition and data migration
  • Migrated SQL Server 2005 databases to SQL Server 2008R, 2008R2 databases also migrated to IBM DB2.
  • Worked on multiple Data Marts in Enterprise Data Warehouse Project (EDW) and involved in designing OLAP data models extensively used slowly changing dimensions (SCD).
  • Developed automated procedures to produce data files using Microsoft Integration
  • Services (SSIS) and performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Netezza
  • Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
  • Implemented Agile Methodology for building an internal application.

Environment: ER Studio, SQL Server 2012, SQL Server Analysis Services 2008, SSIS, SSRS 2008, Oracle 10g, Business Objects XI, Rati3onal Rose, Tableau, ERP, Netezza, Teradata, Excel, Informatica MDM, Pivot tables, DB2, DataStage, MS Office, MS Visio, SQL, Rational Rose, T-SQL, UNIX, Agile, SAS, MDM, Shell Scripting, Crystal Reports 9.

Data Analyst

Confidential

Responsibilities:

  • Attended and participated in information and requirements gathering sessions and translated business requirements into working logical and physical data models for Data Warehouse, Data marts and OLAP applications.
  • Integrated data from various Data sources like MS SQL Server, DB2, Oracle, Netezza and Teradata using Informatica to perform Extraction, Transformation, loading (ETL processes) Worked on ETL development and Data Migration using SSIS and (SQL Loader, PL/SQL).
  • Involved in Designed and Developed logical & physical data models and Meta Data to support the requirements using ERWIN.
  • Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle.
  • Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models and Performance query tuning to improve the performance along with index maintenance.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata and wrote and executed unit, system, integration and UAT scripts in a Data Warehouse projects.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data Warehouse, and data mart reporting system in accordance with requirements.
  • Responsible for Creating and Modifying T-SQL stored procedures/triggers for validating the integrity of the data.
  • Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.

Environment: Oracle, MS Visio, PL-SQL, Microsoft SQL Server, SSRS, T-SQL, Rational Rose, Data warehouse, OLTP, OLAP, ERWIN, Informatica 9.x, Windows, SQL, PL/SQL, SQL Server, Talend Data Quality, Oracle 9i/10g, Flat Files, Windows.

We'd love your feedback!