We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Over 9+ Years of experience in Data field with excellent knowledge on BI,Data warehouse, ETL, Azure and AWS Cloud and Big - Data technologies.
  • Experience on Migrating on premise and real-time data to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Synapse, DataBricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Experienced working on Big-Data technologies like Hive, Sqoop, HDFS and Spark streaming Kafka, and No-SQL Databases like MongoDB, Cassandra, Good exposure on cloud technologies like AWS S3, AWS EC2, AWS Redshift and Azure SQL, Azure DataBricks, Azure Data Factory, Azure ADLS and Azure DW.
  • Highly proficient inDataModeling retaining concepts of RDBMS, Logical and PhysicalDataModeling until 3NormalForm (3NF) and MultidimensionalDataModeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions).
  • Expertise in theDataAnalysis, Design, Development, Implementation and Testing usingData Conversions, Extraction, Transformation and Loading (ETL) and SQL Server, ORACLE and other relational and non-relational databases and experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
  • Experience with commercial BI tools, such as Tableau, Qlikview, Cognos, Microstrategy), as well as open source and experienced working with variousdatasources such as Oracle, SQL Server, DB2, Teradata &Netezza
  • Demonstrable architecture experience, specifically within both Standard CRM packages & ideally within the financial services industry.
  • Experience with various ETL engines, such as Informatica, Oracle OWB, and Microsoft (SQL Server Integration Services), as well as strong experience writing the complete ETL process with PL/SQL, along with scripting languages (Bash/Python/Perl)
  • Experienced in Consolidating and auditing Metadata from disparate tools and sources, including business intelligence (BI), extract, transform, and load (ETL), relational databases, modeling tools, and third-party metadata into a single repository.
  • Experienced in using distributed computing architectures such as AWS products (e.g. S3, EC2, Redshift, and EMR), Hadoop and effective use of Map-Reduce, SQL and Cassandra to solve Big Data type problems.
  • Acquired extensive experience in business analysis, entity relationship, and dimensional data modeling, data warehousing (Kimball Methodology), GUI design, OLTP system development based on relational and hierarchical databases, as well as experience relevant to OOD and structured systems analysis, design, programming, and testing and used complete knowledge ofdatawarehouse methodologies (Ralph Kimball, Inmon), ODS, EDW and Metadata repository.
  • Expertise in writing SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors and extensive experience in advanced SQL Queries and PL/SQL stored procedures.
  • Experienced in deploying and scheduling Reports using SSRS to generate all daily, weekly, monthly and quarterly Reports including current status and experienced in designing and deploying reports with Drill Down, Drill Through and Drop-down menu option and Parameterized and Linked reports.
  • Extensive working experience in Normalization and De-Normalization techniques for both OLTP and OLAP systems in creating Database Objects like tables, Constraints (Primary key, Foreign Key, Unique, Default), Indexes.

TECHNICAL SKILLS

Programming Languages: Python, SQL, PL/SQL, UNIX shell Scripting and PERL

DataModeling: Erwin r9.6/r9.5/7.x, ER/Studio 9.7/9.0/8x and Power Designer

Databases: Oracle12g/11g/10g, Teradata, DB2 UDB, SQL Server 2016/2010/2008 , MySQL, MS- Access, Flat Files, XML-files, JSON, CSV and Parquet files.

Cloud Technologies: AWS S3, Glue, Redshift and Azure SQL, DW, ADLS, Storage Blob, HD Insights, DataBricks, Azure Devops, Synpase, Snowflake and Data Factory.

Big-Data Technologies: Hive, MondoDB, Cassandra, Oozie, Spark, Kafka, Sqoop and HDFS.

Operating Systems: Windows, Linux and UNIX

ETL Tools: SSIS and Datastage.

Scheduling Tools: Autosys, Airflow, Maestro (Tivoli) and Oozie.

PROFESSIONAL EXPERIENCE

Confidential, Chicago IL

Sr. Data Engineer

Responsibilities:

  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data and understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Developed reporting model in an optimized way by caching recentdatain SQL server database and creating external tables in Azure SQL DW Database.
  • Used Data Loading Methods followed while migrating from on-premises to Azure Cloud Environment: Followed ELT approach, as this is suggested because of Azure SQL DW, Copy data files from customer site to Azure blob storage area, ADF connection to blob storage to fetch data from text files, SQL DW will host separate schemas for staging and production data and ADF connection to Azure DW using SQL authentication to process data from staging and update to production schema using SEIS and stored procedures and Reporting data to be refreshed from production area to reporting server on daily basis through nightly refresh.
  • Converting existing hive queries to Spark SQL queries to reduce execution time and Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Worked on predictive and what-if analysis using Python from HDFS and successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
  • Loaded data into Hive Tables from (HDFS) to provide SQL access on Hadoop data and worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Synapse, Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Created and configured the input, query and output forAzureStream Analytics job which makes a function call to anAzureML experiment.
  • Involved in creating and modifying enterprise logical data model (ELDM), physical data models, subject areas, designing star schemas, Dimensional modeling using Erwin and Involved in Normalization / De-normalization techniques for optimum performance in relational and dimensional database environments.
  • Conducting strategy and architecture sessions and deliver artifacts such as MDM strategy (Current state, Interim State and Target state) and MDM Architecture (Conceptual, Logical and Physical) at detail level.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns and Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Created Tableau Dashboard for the top key Performance Indicators for the top management by connecting various data sources like Excel, Flat files and SQL Database.
  • Performed Data Analysis, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata and used External Loaders like Multi Load, T Pump and Fast Load to load data into Teradata Database analysis, development, testing, implementation and deployment.
  • Implemented a CI/CD pipeline using Azure DevOps in both cloud and on-premises with GIT, Docker, along with Jenkins plugins.

Environment: Erwin r9.6, SQL, Oracle 12c, MDM, Azure-SQL, Azure DW, Azure DataBricks, Synapse, Azure ADLS, Data Factory, Teradata SQL Assistant, Python, Tableau, Netezza Aginity, Informatica, PL/SQL, SQL Server, Windows, Hive, Sqoop, Cassandra, Hadoop and UNIX.

Confidential, Chicago IL

Sr. Data Engineer

Responsibilities:

  • Participated in the design, development, and support of the corporate operation data store and enterprise data warehouse database environment and manage the technical delivery of custom development, integrations, and data migration elements of a Microsoft Dynamics CRM implementation.
  • Providing E2E solutions for EnterpriseDataManagement includingDataArchitecture, Designing and ArchitectingData(Quality, Governance), Metadata Management, Designing & CoilingDataStrategy, MasterDataManagement andDataModelling-Conceptual, Logical & Physical.
  • Worked in Agile Data Modeling methodology and creating data models in sprints in SOA architecture and involved with delivery of complex enterprise data solutions with comprehensive understandings in Architecture, Security, Performance, Scalability, and Reliability.
  • Developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Did exploratory data analysis (EDA) using Python and done Python integration with Hadoop Map Reduce and spark and worked on NoSQL databases including Mongo DB, and Cassandra and implemented multi-datacenter and multi-rack Cassandra cluster.
  • Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
  • Spun up HDInsight clusters and used Hadoop ecosystem tools like Kafka, Spark and data bricks for real-time analytics streaming, Sqoop, pig, hive and Cosmos DB for batch jobs.
  • Generated DDL from model for RDBMS (Oracle, Teradata) and created, managed, and modified logical and physical data models using a variety of data modeling philosophies and techniques including Inmon or Kimball.
  • Involved in complete remodeling of data processing pipeline by changing the design of the data flow and created pipelines to migrate the data from on-prem resources through the data lake and load the data into the Azure SQL Datawarehouse.
  • Responsible for building Confidential data cube using SPARK framework by writing Spark SQL queries in Scala so as to improve efficiency of data processing and reporting query response time.
  • Involved in T-SQL queries and optimizing the queries in Oracle 12c, SQL Server 2014, DB2, and Netezza,Teradata and involved in Normalization and De-Normalization of existing tables for faster query retrieval.
  • Designed different type of STAR schemas like detailed data marts and Plan data marts, Monthly Summary data marts using ER studio with various Dimensions Like Time, Services, Customers and various FACT Tables.
  • Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.
  • Developed and published reports and dashboards using Power BI and written effective DAX formulas and expressions.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data and developed SQL Stored procedures to query dimension and fact tables in data warehouse.
  • DevelopedTableauvisualizations and dashboards usingTableauDesktop, Tableauworkbooks from multipledatasources usingDataBlending.
  • Incorporating GIT to keeping track of the history, merging code between the different versions of the software, check-in/check-out options.

Environment: ER Studio, Python, Informatica Power Center, IBM Information Analyzer, Tableau, Teradata, Oracle 11g, DB2, NoSQL, Hadoop, Hbase, SQL, PL/SQL, XML, AzureSQL, Azure ADF, Azure Data Factory, Azure SQL Data Warehouse, DAX, Unix Shell Scripting.

Confidential, Denver CO

Sr. Data Modeler/Data Analyst

Responsibilities:

  • Collaboratively worked with the Data modeling architects and other data modelers in the team to design the Enterprise Level Standard Data model.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Extensively involved in developing logical and physical data models that fit in exactly for the current state and the future state data elements and data flows using ER/Studio.
  • Utilized existing Informatica, Teradata, and SQL Server to deliver work and fix production issues on time in fast paced environment.
  • Responsible for full data loads from production to AWS Redshift staging environment and worked on migrating of EDW to AWS using EMR and various other technologies.
  • Defined the key columns for the Dimension and Fact tables of both the Warehouse and Data Mart and conduct Design discussions and meetings to come out with the appropriate Data Mart at the lowest level of grain for each of the Dimensions involved.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Involved in ETL processing using Pig & Hive in AWS EMR, S3 and Data Profiling, Mapping and Integration from multiple sources to AWS S3.
  • Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models.
  • Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
  • Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Writing ad-hoc queries based on schema knowledge for various reporting requirements Writing / Tuning data ingestion procedures from external suppliers and partners using PL/SQL, Teradata SQL Assistant, SQL loader, 3rd party tools.
  • Designed Normalized data up to 3rd Normal form and participated in brain storming sessions with application developers and DBAs to discuss about various de-normalization, Partitioning and Indexing Schemes for Physical Model.
  • Created Hive architecture used for real time monitoring and HBase used for reporting and worked for map reduce and query optimization for Hadoop hive and HBase architecture.
  • Collaborated with the Reporting Team to design Monthly Summary Level Cubes to support the further aggregated level of detailed reports.
  • Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.
  • Worked on designing Conceptual, Logical and Physical data models and performed data design reviews with the Project team members.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Involved in Teradata utilities (BTEQ, Fast Load, Fast Export, Multiload, and Tpump) in both Windows and Mainframe platforms.

Environment: ER/Studio, OLTP, SQL, SSIS, SSAS, SSRS, PL/SQL and 3NF, Hadoop, Hive, Pig, MapReduce, MongoDB, HBase, AWS S3, DAX, AWS Redshift, Python, BigData, Spark, XML, Tableau, SSRS, Teradata, Netezza and Teradata SQL Assistance.

Confidential, Lake Forest, IL

Sr. Data Modeler/ Data Analyst

Responsibilities:

  • Developed normalized Logical and Physical database models to design OLTP system for finance applications. Created dimensional model for reporting system by identifying required dimensions and facts using Erwin.
  • Developed a Conceptual model using Erwin based on business requirements and produced functional decomposition diagrams and defined logicaldatamodel.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch datafrom legacy Oracle and SQL Server database systems
  • Extensive use of DAX (Data Analysis Expressions) functions for the reports and for the tabular models.
  • Used Erwin for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information.
  • Managed the historical data in the data warehouse from various data sources and generated various drill down, drill through, matrix and chart reports using SSRS.
  • Worked on designing the OLAP Model/Dimension model for BI Reporting sourcing from SAP Transactions.
  • Worked with ETL teams and used Informatica Designer, Workflow Manager and Repository Manager to create source and target definition, design mappings, create repositories and establish users, groups and their privileges.
  • Used forward engineering to create a PhysicalDataModel with DDL, based on the requirements from the LogicalDataModel and implemented Referential Integrity using primary key and foreign key relationships.
  • Performed Data validation, Data cleansing, Data integrity, Data Quality checking before delivering data to operations, Business, Financial analyst by using Oracle,Teradata.
  • Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement and identified and tracked the slowly changing dimensions and determined the hierarchies in dimensions.
  • Designed high level ETL architecture for overalldatatransfer using SSIS from the source server to theWarehouse and defined various facts and dimensions in thedatamart including Fact less Facts and designed theDataMart defining Entities, Attributes and relationships between them.
  • Worked on designing the OLAP Model/Dimension model for BI Reporting sourcing from SAP Transactions.
  • Consulted with client management and staff to identify and document business needs and objectives, current operational procedures for creating the logicaldatamodel.
  • Design Source to target mapping as per the requirement of the client and business rules to load thedatafrom Source to Stage and from Stage toDataWarehouse
  • Provided technical guidance for re-engineering functions of Teradata warehouse operations into Netezza.
  • Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT) and applieddatanaming standards, created thedatadictionary and documenteddatamodel translation decisions and also maintained DW metadata.
  • Extractdatafrom Hadoop environment and NoSQLdatafrom MongoDB to Stage DB's.
  • Developed complex jobs using various stages like Lookup, Join, Transformer, Dataset, Row Generator, Column Generator, Datasets, Sequential File, Aggregator and Modify Stages.
  • Worked on different tasks in Informatica Power Center Workflow Manager like Sessions, Events raise, Event wait, Decision, E-mail, Command, Worklets, Assignment, Timer and Scheduling of the workflow.
  • Conducted complex ad-hoc programming and analysis including statistical programming and analysis and worked with ad-hoc reporting using Crystal Reportsand T-SQL.
  • Worked on enterprise logicaldatamodeling project (in 3NF) to gatherdata requirements for OLTP enhancements.
  • Involved in extensiveDatavalidation by writing several complex SQL queries and involved in back-end testing and worked withdataquality issues.
  • Worked in importing and cleansing ofdatafrom various sources like Teradata, Oracle, flatfiles, SQLServer with high volumedata.
  • Review and analyze database performance, SSAS cubes and reports generated using SSRS.

Environment: Erwin 9.x, SQL, SQL Server 2008, Rational Rose, Windows XP, Oracle 10g, TOAD, PL/SQL, Flat Files, Teradata, T-SQL, NetezzaA ginity, MDM, informatica Power Centre, DB2, SSRS, SAS, SQL Server, SSIS, SSAS, SSRS, Visio, SharePoint, Informatica, Tableau

Confidential, Dewitt - NY

Sr. Data Analyst

Responsibilities:

  • Createdconceptual,logicaland physical relationalmodelsfor integration and base layer; created logicaland physical dimensionalmodelsfor presentation layer and dim layer for a dimensionaldata warehouse in Power Desinger.
  • Involved in reviewing business requirements and analyzingdatasources form Excel/Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.
  • Analyzing, designing, developing, implementing and maintaining ETL jobs using IBM Info sphere Data stage andNetezza.
  • Extensively worked in Client-Server application development usingOracle 10g, Teradata 14, SQL, PL/SQL, OracleImport and Export Utilities.
  • Coordinated with DB2 on database build and table normalizations and de-normalizations.
  • Conducted brain storming sessions with application developers and DBAs to discuss about various de-normalization, partitioning and indexing schemes for Physical Model.
  • Involved in several facets ofMDMimplementations including Data Profiling, metadata acquisition and data migration.
  • Extensively used SQL Loader to load data from the Legacy systems into Oracle databases using control files and used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Involved in extensive Data validation by writing several complex SQL queries and involved in back-end testing and worked withdataquality issues.
  • Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases, and process SSAS cubes to store data to OLAP databases
  • Strong understanding of Data Modeling (Relational, dimensional,StarandSnowflakeSchema), Data analysis, implementations of Data warehousing using Windows and UNIX.
  • Extensively worked withNetezzadatabase to implement data cleanup, performance tuning techniques.
  • Created ETL packages using OLTP data sources (SQL Server, Flat files, Excel source files, Oracle) and loaded the data into target tables by performing different kinds of transformations using SSIS.
  • MigratedSQLserver2008 toSQLServer2008 R2 in Microsoft WindowsServer2008 R2 Enterprise Edition.
  • Developing reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.
  • Performed data validation on the flat files that were generated in UNIX environment using UNIX commands as necessary.
  • Worked with NZ Load to load flat file data intoNetezza, DB2 and Architect to identify proper distribution keys forNetezzatables.

Environment: Power Designer, Teradata, Oracle, PL/SQL, MDM, SQL Server 2008, ETL, Netezza, DB2, SSIS, SSRS, SAS, SPSS, Datastage, Informatica, SQL, T-SQL, UNIX, Netezza, Aginity, SQL assistance etc.

We'd love your feedback!