Data Engineer/data Modeler Resume
Monroe, LouisianA
SUMMARY
- Near about 9+ years in IT industry with hands on working experience in Data Engineering, Data Modeling and Data Analysis.
- Expert knowledge in SDLC (Software Development Life Cycle) and Agile development methods was involved in all phases in projects.
- Experience in automation and building CI/CD pipelines by using Jenkins.
- Experience in Microsoft Azure date storage and Azure Data Factory, Azure Data Lake Store(ADLS), AWS and Redshift.
- Good knowledge in streaming applications using Apache Kafka.
- Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and also involved in extracting the data from these tools on to the cluster using Sqoop.
- Experience in designing Star Schema, Snowflake schema for Data Warehouse, by using tools like Erwin, Power Designer and E - R Studio.
- Experience in integration of various relational and non-relational sources such as SQL Server, Teradata, Oracle, and NoSQL database.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the Big Data as per the requirement.
- Experience in modeling with both OLTP/OLAP systems and Kimball and ImmonData warehousing environments.
- Experience in extracting, transforming and loading (ETL) data from spreadsheets, database tables and other sources using Microsoft SSIS.
- Excellent in Data Analysis, Data Profiling, Data Validation, Data Cleansing, Data Verification, and Data Mismatch Identification.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Experience designing security at both the schema level and the accessibility level in conjunction with the DBAs
- Using MS Excel and MS Access to dump the data and analyze based on business needs.
- Ability to work on multiple projects at once while prioritizing the tasks based on team priorities.
- Proven ability to convey rigorous technical concepts and considerations to non-experts.
TECHNICAL SKILLS
Big Data tools: Hadoop 3.3, HDFS, Hive 3.2.1, Pig 0.17, HBase 1.2, Sqoop 1.4, MapReduce, Spark, Kafka2.8.
Cloud Services: AWS and Azure.
Data Modeling Tools: Erwin 9.8/9.7, ER/Studio V17
Database Tools: Oracle 12c/11g, Teradata 15/14.
Reporting tools: SQL Server Reporting Services (SSRS), Tableau, Crystal Reports, Strategy, Business Objects
ETL Tools: SSIS, Informatica v10, Matillion.
Programming Languages: SQL, T-SQL, UNIX shells scripting, PL/SQL.
Operating Systems: Microsoft Windows 10/8/7, UNIX
Project Execution Methodologies: RUP, JAD, Agile, Waterfall, and RAD
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer/Data Modeler
Responsibilities:
- As a Data Engineer/Data Modeler designed and deployed scalable, highly available, and fault tolerant systems on Azure.
- Worked in SCRUM (Agile) development environment with tight schedules.
- Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
- Developed and designed continuous integration pipeline and integrated using Jenkins.
- Developed ETL pipelines in and out of data warehouse using combination of Python and Snowsql.
- Utilized Matillion ETL solution to develop pipeline that extract and transform data from multiple sources and loading to Snowflake.
- Designed and implemented database solutions in Azure Data Lake, Azure Data Factory, Data Bricks, Azure Synapse Analytics.
- Used Matillion tool blob storage component and loaded the tables to snowflake Stage layer.
- Developed Python scripts to clean the raw data.
- Worked at conceptual/logical/physical data model level using Erwin according to requirements.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Designed and implemented ETL pipelines between from Snowflake DB to the Data Warehouse using Apache Airflow.
- Used Python Packages for processing HDFS file formats.
- Worked on Microsoft Azure toolsets including Azure Data Factory Pipelines, Azure Data bricks, Azure Data Lake Storage.
- Worked with Data governance, Data quality to design various models and processes.
- Encoded and decoded JSON objects using PySpark to create and modify the data frames in Apache Spark.
- Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
- Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes.
- Developed MDM meta data dictionary and naming convention across enterprise.
- Configured the ADF jobs, SnowSQL jobs triggering in Matillion using python.
- Implemented Azure Data bricks clusters, notebooks, jobs and auto scaling.
- Designed and implemented effective Analytics solutions and models with Snowflake.
- Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflake’s Snow SQL.
- Troubleshoot and maintain ETL jobs running using Matillion.
- Created Oozie workflows to manage the execution of the crunch jobs and vertica pipelines.
- Worked on Azure on services like Azure Data Factory, Azure Synapse.
- Supported solutions and constructed prototypes that incorporated Azure resources like Azure Data Factory, Azure Cosmos Db and Data Bricks.
- Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks.
- Tuned and Troubleshooted Snowflake for performance and optimize utilization.
- Designed and built a Data Discovery Platform for a large system integrator using Azure HdInsight components.
- Scheduled all the staging, intermediate and final core tables load to snowflake on the Matillion tool.
- Created and Configured Azure Cosmos DB.
- Provided ad-hoc queries and data metrics to the Business Users using Hive and Pig.
- Developed visualizations and dashboards using PowerBI.
Environment: Hadoop3.3, Agile, CI/CD, Azure, Matillion, Python3.5, Erwin R2Sp2, Snowflake, MDM, CosmosDB, Spark, SQL, PL/SQL, JSON.
Confidential - Monroe, Louisiana
Data Modeler/Data Engineer
Responsibilities:
- Worked as a Data Modeler/Data Engineer to Import and export data from different databases.
- Involved in Agile development methodology active member in scrum meetings.
- Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines.
- Implemented phasing and checkpoint approach in ETL process to prevent data loss and to maintain uninterrupted data flow against process failures.
- Generated database scripts usingForward Engineer in using Data Modeling Tool.
- Created Design Fact & Dimensions Tables, Conceptual, Physical and Logical Data Models using Erwin.
- Responsible for data lineage, maintaining data dictionary, naming standards and data quality.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Developed ETL data mapping and loading logic for MDM loading from internal and external sources.
- Implemented Referential Integrity using primary key and foreign key relationships.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Widely used Normalization methods and have done different normalizations (3NF).
- Designed and created Data Quality baseline flow diagrams, which includes error handling and test plan flow data
- Performed Verification, Validation and Transformations on the Input data (Text files) before loading into target database.
- Used Erwin for reverse engineering to connect to existing database and ODS.
- Used AWS Cloud with Infrastructure Provisioning / Configuration.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Working closely with the Data Stewards to ensure correct and related data is captured in the data warehouse as part of Data Quality check.
- Extracted data using SQL queries and transferred it to Microsoft Excel and Python for further analysis.
- Validated the data feed from the source systems to Snowflake DW cloud platform.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Created Queries and Tables using MySQL.
- Developed dashboards using Tableau Desktop.
- Created numerous reports using report lab and python packages. Installed numerous Python modules.
- Handled performance requirements for databases in OLTP and OLAP models.
- Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python.
- Created mapping tables to find out the missing attributes for the ETL process.
- Developed various T-SQL stored procedures, triggers, views and adding/changing tables for data load, transformation and extraction.
Environment: Erwin9.8, AWS, MySQL, Python, Hadoop3.0, NoSQL, ETL, Sqoop1.4, MDM, OLAP, OLTP, ODS, Tableau, Agile.
Confidential - Chicago, IL
Sr. Data Analyst/Data Modeler
Responsibilities:
- Performed data analysis, data modeling and data profiling using complex SQL queries, Facets as the source, and Oracle as the database.
- Created physical and logical data models using Erwin.
- Translated business concepts into XML vocabularies by designing XML Schemas with UML
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snow flake Schemas.
- Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
- Developed batch processing solutions by using Data Factory, Azure SQL and Azure Databricks.
- Worked with data investigation, discovery, and mapping tools to scan every single data record from many sources.
- Worked with developers on data Normalization and De-normalization, performance tuning issues, and provided assistance in stored procedures as needed.
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Extensively created SSIS packages to clean and load data to data warehouse.
- Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data
- Performed data mining on Claims data using very complex SQL queries and discovered claims pattern.
- Created PL/SQL procedures, triggers, generated application data, Created users and privileges, used oracle utilities import/export.
- Create and maintain data model standards, including master data management (MDM).
- Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT).
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Created customized report using OLAP Tools such as Crystal Report for business use.
- Generated periodic reports based on the statistical analysis of the data from various time frame and division using SQL Server Reporting Services (SSRS).
- Involved in extensive data analysis on Teradata, and Oracle Systems Querying and Writing in SQL.
- Developed database triggers and stored procedures using T-SQL cursors and tables.
- Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
- Extensively created tables and queries to produce additional ad-hoc reports.
Environment: Erwin, Azure, SQL, MDM, PL/SQL, SSIS, SSRS, OLAP, OLTP, T-SQL, UNIX, MX Excel.
Confidential - Boston, MA
Data Analyst/Data Modeler
Responsibilities:
- Worked with Data Analyst/Data Modeler for requirements gathering, business analysis and project coordination.
- Conducted a JAD session to review the data models involving SME, developers, testers and analysts.
- Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
- Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
- Involved in requirement gathering and database design and implementation of star-schema, snowflake schema/dimensional data warehouse using ER/Studio.
- Worked extensively with MicroStrategy Report developers in creating data marts and develop reports
- Work with the Data Analysis team to gathering the Data Profiling information.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Created DB2 objects such as databases, tables, indexes, triggers, stored procedures etc.
- Performed the detail data analysis, Identify the key facts and dimensions necessary to support the business requirements.
- Generated Data dictionary reports for publishing on the internal site and giving access to different users.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
- Created SSIS package to load data from Flat files, Excel and Access to SQL server using connection manager.
- Develop all the required stored procedures, user defined functions and triggers using T-SQL and SQL.
- Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.
- Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
- Used MS Visio to represent system under development in a graphical form by defining use case diagrams, activity and workflow diagrams
Environment: ER/Studio, DB2, PL/SQL, SSIS, MicroStrategy, MX Excel, T-SQL, UNIX, OLAP, OLTP.