Sr. Data Engineer Resume
Chicago, IL
SUMMARY
- Highly dedicated, inspiring and expert Sr Data Engineer with over 9 Plus years of IT industry experience exploring various technologies, tools and databases like Big Data, AWS (S3, Redshift ODS, Lambda, Glue, Athena etc.), Snowflake, Hadoop, Hive, Spark, Kafka, Python, PySpark, Sqoop, CDL(Cassandra) ), Teradata, PostgreSQL, Tableau, Qlikview, SQL, PLSQL and ETL Informatica.
- Experienced inDataAnalysis and modeling, analytics, with excellent understanding ofDataWarehouse, Databases, Data Governance, andDataMart designing and in Database Creation and maintenance of physicaldatamodels with Oracle, Teradata, Netezza, DB2 and SQL Serverdatabases.
- Experienced in Teradata, SQL queries, Teradata, indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
- Extensively worked on Spark using Python on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
- Well versed in doing migration to Cloud AWS platform (S3, Redhisft, Athena, Lambda, Glue, SNS, SQS, EMR).
- Experienced inDataModeling retaining concepts of RDBMS,Logicaland PhysicalData Modeling until 3NormalForm (3NF) and MultidimensionalDataModeling Schema (Star schema, Snow - Flake Modeling, Facts and dimensions) and using ER diagram, Dimensional data modeling, Logical/Physical Design, Star Schema modeling, Snow-flake modeling using tools like Erwin and ER/Stuido.
- Experience in providing solutions within theHadoop environment using technologies such as HDFS, MapReduce, Pig, Hive, HBase, Zookeeper, Storm, and other BigDatatechnologies
- Experienced in multiple data sources and targets including Oracle,Netezza, and DB2, SQL Server, XML and flat files and involved inNetezzaAdministration Activities like backup/restore, performance tuning, and Security configuration.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka and Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Experience in BI/DW solution (ETL, OLAP,Datamart), Informatica, BI Reporting tool like Tableau and Qlikview and also experienced leading the team of application, ETL, BI developers, Testing team
- DataWarehousing: Full life-cycle project leadership, business-driven requirements, capacity planning, gathering, feasibility analysis, enterprise and solution architecture, design, construction,data quality, profiling and cleansing, source-target mapping, gap analysis,dataintegration/ETL, SOA, ODA,datamarts, Inman/Kimball methodology,DataModeling for OLTP, canonical modeling, Dimension Modeling fordataware house star/snowflake design.
- Experienced in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center (Repository Manager, Designer, Workflow Manager, Workflow Monitor, and Metadata Manger), Power Exchange, and Power Connect asETLtool on Oracle, DB2 and SQL Server Databases.
- Experienced in Client-Server application development usingOraclePL/SQL, SQL PLUS, SQL Developer, TOAD, SQL LOADER and well Versed with advance concept in Excel. Worked with Vlookup, Index, Match, IF Statements, Pivots and creating complex formulas
- Experienced in using Excel and MSAccess to dump thedataand analyze based on business needs and Expertise in SQL Server Analysis Services (SSAS) to deliver Online Analytical Processing (OLAP) and data mining functionality for business intelligence applications.
- Excellent understanding and working experience of industry standard methodologies like System Development Life Cycle (SDLC), as per Rational Unified Process (RUP), AGILE and Waterfall Methodologies.
TECHNICAL SKILLS
DataModeling Tools: Erwin r9.6/r9.5/r9.1, ER Studio and Oracle Designer, Power Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9.
ETL Tools: SSIS, Datastage, Informatica Power Center 9.7/9.6 /9.5/ 9.1.
Programming Languages: SQL, PL/SQL, Python, PySpark and Scala
Database Tools: Microsoft SQL Server 2014/2012/2008 , Teradata and MS Access, Postger SQL, Netezza, SQL Server, Oracle, PostgreSQL, MongoDB and Cassandra.
Reporting and BI Tools: Tableau, QlikView, Business Objects, Crystal Reports
Operating Systems: Microsoft Windows, Linux and UNX
Tools: & Software: TOAD7.1/6.2, MS Office, BTEQ, Teradata SQL Assistant
Big Data: Hadoop, HDFS, Hive, HBase, Sqoop, Flume, Spark, Kafka and Airflow.
Cloud Technologies: AWS (Redshift, AWS S3, AWS Glue, Aurora, Athena, SNS, SQS and EMR)
Other tools: TOAD, SQL *PLUS, SQL*LOADER, MS Project, MS Visio and MS Office, Have worked on C++, UNIX, PL/SQL etc.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential
Responsibilities:
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and BigDatatechnologies and building relationships and trust with key stakeholders to support program delivery and adoption of enterprise architecture.
- Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements and designed data models and analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and involve with Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark. Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using Python, Hadoop and MongoDB, Cassandra.
- Developed automateddatapipelines from various external data sources (web pages, API etc) to internaldatawarehouse (SQL server) then export to reporting tools byPython.
- Designed the schema, configured and deployed AWSRedshiftfor optimal storage and fast retrieval ofdata and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezzadatabase.
- Created Hive Fact tables on top of raw data from different retailer’s which indeed partitioned by Time dimension key, Retailer name, Data supplier name which further processed pulled by analytics service engine.
- Worked in importing and cleansing of data from various sources like Teradata 15, Oracle, flat files, SQL Server with high volume data.
- Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries.
- Involved in creating informatica mapping to populate staging tables and data warehouse tables from various sources like flat files DB2,Netezzaand oracle sources.
- Involved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS/AWS S3.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- DevelopedDataMapping, Data profiling,DataGovernance, and Transformation and cleansing rules for the MasterData Management Architecture involving OLTP, ODS and Performeddataanalysis anddataprofiling using complex SQL queries on various sources systems including Oracle, Teradata and Netezza.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS and Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon EMR.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
- Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value
- DevelopedTableauvisualizations and dashboards usingTableauDesktop, Tableauworkbooks from multipledatasources usingDataBlending and Scheduled Airflow DAGs to run multiple Hive and Pig jobs, which independently run with time and data availability.
Environment: Erwinr9, Oracle 12c, Teradata15, Netezza, PL/SQL, T-SQL, MDM, Python, Pyspark, S3, SNS, SQS, Athena, EMR, AWS Glue, Informatica, DB2, Spark, Kafka, SQL Sever2014, SQL, Hadoop, Hive Queries, MongoDB, PostgreSQL, SAS, Spark, SSIS, Tableau, AWS Redshift, Sqoop, MongoDB, HBase.
Sr. Data Engineer
Confidential - Chicago IL
Responsibilities:
- Participated in the design, development, and support of the corporate operation data store and enterprise data warehouse database environment.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift and have used AWS components (Amazon Web Services) - Downloading and uploading data files (withETL) to AWS system using S3 components and Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
- Worked on predictive and what-if analysis using Python from HDFS and successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE and worked on NoSQL databases including HBase, Mongo DB, and Cassandra.
- Develop the Spark Sql logics which mimics the Teradata ETL logics and point the output Delta back to Newly Created Hive Tables and as well the existing Teradata Dimensions, Facts, and Aggregated Tables.
- Developed jobs to send and read data from AWS S3 buckets using components like tS3Connection, tS3BucketExist, tS3Get, tS3Put and created the SFDC, Flat File and Oracle connections forAWSCloud services.
- Cleansed thedataby eliminating duplicate and inaccuratedatain Python and usedPythonscripts to update the content in database and manipulate files.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Hive.
- Used External Loaders like Multi Load, T Pump and Fast Load to load data into Teradata Database analysis, development, testing, implementation and deployment.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
- Involved in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Developed requirements, perform data collection, cleansing, transformation, and loading to populate facts and dimensions for data warehouse and Built database Model, Views and API's usingPythonfor interactive web-based solutions.
- Maintaining data mapping documents, business matrix and other data design artifacts that define technical data specifications and transformation rules
- Developed ETL (Informatica) mappings, testing, correction and enhancement and resolveddataintegrity issues and coordinated multiple OLAP and ETL projects for variousdatalineage and reconciliation.
- Managed the Master Data Governance queue including assessment of downstream impacts to avoid failures and worked with high volume datasets from various sources like SQL Server 2012, Oracle, DB2, and Text Files.
- Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update and involved in extensive Data validation by writing several complex SQL queries and involved in back-end testing and worked withdataquality issues.
- Developed, managed and validated existingDataModels including Logical and Physical Models of theDataWarehouse and source systems utilizing a 3NF model and implemented dimension model (logical and physical data modeling) in the existing architecture using ER/Studio.
- Involved in creating/modifying worksheets and data visualization dashboards in Tableau.
Environment: ER/Studio, SSRS, SAS, Netezza, Excel, MDM, PL/SQL, ETL, Python, Tableau, Hadoop, AWS Glue, S3, Lambda, Redshift, Hive, Spark, Kafka, Mongo DB, Aginity, Teradata SQL Assistant, Cassandra, T-SQL, Cognos, DB2, Oracle11g, SQL, Teradata14.1, Informatica Power Center9.6 and Airflow.
Sr. Data Modeler/Data Analyst
Confidential - Boston MA
Responsibilities:
- Involved in reviewing business requirements and analyzingdatasources form Excel/Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.
- Created conceptual,logicaland physical relationalmodelsfor integration and base layer; created logicaland physical dimensionalmodelsfor presentation layer and dim layer for a dimensionaldata warehouse in Power Desinger and designed ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) as per business requirements.
- Developed and configured on Informatica MDM hub supports the Master Data Management (MDM), Business Intelligence (BI) and Data Warehousing platforms to meet business needs.
- Implementation of full lifecycle in Data warehouses and Business Data marts with Star Schemas, Snowflake Schemas, SCD & Dimensional Modeling and involved in analyzing, designing, developing, implementing and maintaining ETL jobs using IBM Info sphere Data stage andNetezza.
- Handled importing of data from various data sources, performed transformations using Hive, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
- Extensively worked in Client-Server application development usingOracle 10g, Teradata 14, SQL, PL/SQL, OracleImport and Export Utilities and Coordinated with DB2 on database build and table normalizations and de-normalizations.
- Conducted brainstorming sessions with application developers and DBAs to discuss about various de-normalization, partitioning and indexing schemes for Physical Model and involved in several facets ofMDMimplementations including Data Profiling, metadata acquisition and data migration.
- Extensively used SQL Loader to load data from the Legacy systems into Oracle databases using control files and used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
- Involved in extensive Data validation by writing several complex SQL queries and involved in back-end testing and worked withdataquality issues.
- Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases, and process SSAS cubes to store data to OLAP databases
- Strong understanding of Data Modeling (Relational, dimensional,StarandSnowflakeSchema), Data analysis, implementations of Data warehousing using Windows and UNIX.
- Extensively worked withNetezzadatabase to implement data cleanup, performance tuning techniques.
- Created ETL packages using OLTP data sources (SQL Server 2008, Flat files, Excel source files, Oracle) and loaded the data into target tables by performing different kinds of transformations using SSIS.
- MigratedSQLserver2008 toSQLServer2008 R2 in Microsoft WindowsServer2008 R2 Enterprise Edition and developing reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.
- Performed data validation on the flat files that were generated in UNIX environment using UNIX commands as necessary and involved in Designing, developing and implementing Power BI Dashboards, Scorecards & KPI Reports.
- Worked with automating reporting functionality using Power BI tools Power BI reporting, Dashboards & Scorecards (KPI) and MySQL, Hadoop & Data warehouse data sources.
Environment: Power Designer, Teradata14, Oracle10g, PL/SQL, MDM, SQL Server 2008, ETL, Netezza, DB2, SSIS, SSRS, SAS, SPSS, Datastage, Informatica, SQL, Power BI, Hadoop, Hive, Sqoop, HDFS, T-SQL, UNIX, Netezza, Aginity, SQL assistance etc.
Sr. Data Modeler/Data Analyst
Confidential
Responsibilities:
- Worked with Business users during requirements gathering and business analysis to prepare high level LogicalDataModels and PhysicalDataModels and Performed Reverse Engineering of the current application using ER/Studio and developed Logical and PhysicalDataModels for Central Model consolidation.
- Analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment and Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Involved in integration of various relational and non-relational sources such as DB2, Teradata 13.1, Oracle 9i, SFDC, Netezza, SQL Server, COBOL, XML and Flat Files.
- Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tablescolumns on Teradata database as part of data analysis responsibilities.
- Performed data mining on Claims data using very complex SQL queries and discovered claims pattern and Created DML code and statements for underlying & impacting databases.
- Involved inNormalization/De-normalization, Normal Form and database design methodology. Expertise in using data modeling tools like MS Visio and ER/Studio Tool for logical and physical design of databases.
- PerformedDataModeling, Database Design, andDataAnalysis with the extensive use of ER/Studio and documented ER Diagrams, Logical and Physical models, business process diagrams and process flow diagrams.
- Created reports inOracleDiscoverer by importing PL/SQL functions on the Admin Layer, in order to meet the sophisticated client requests.
- Extensively used SQL, Transact SQL and PL/SQL to write stored procedures, functions, packages and triggers and created tables, views, sequences, indexes, constraints and generated SQL scripts for implementing physicaldatamodel.
- Performing data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel and SQL.
- Created PhysicalDataModelfrom theLogicalDataModelusing Compare and Merge Utility in ER/Studio and worked with the naming standards utility.
- Involved in implementing the Land Process of loading the customer Data Set into Informatica Power Center, MDMfrom various source systems
- Migration of databases fromDB2 toSQLServer2005, 2008 and tuning and code optimization using different techniques like dynamic SQL, dynamic cursors, tuning SQL queries, writing generic procedures, functions and packages
- Extensively worked on Shell scripts for running SSIS programs in batch mode on UNIX and Created mappings using pushdown optimization to achieve good performance in loading data intoNetezza.
Environment: ER/Studio, Teradata 13.1, SSIS, SAS, Excel, T-SQL, SSRS, Tableau, SQL Server, Cognos, Pivot tables, Graphs, MDM, PL/SQL, ETL, DB2, Oracle 9i, SQL, Teradata14.1, Informatica Power Center etc.
Sr. Data Analyst
Confidential, Chicago IL
Responsibilities:
- Involved in creating logical and physical data analyst with STAR and SNOWFLAKE schema techniques using Erwin 9.x inDatawarehouse as well as inDataMart.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle 8.x, Netezza and Teradata.
- Involved in DataAnalysis,DataValidation,DataCleansing,DataVerification and identifying datamismatch.
- Writing and executing customized SQL code for ad hoc reporting duties and used other tools for routine.
- Extensively used SQL to develop reports from the existing relational data warehouse tables (Queries, Joins, Filters, etc).
- Migrated SSIS packages from SQLServer2005 to SSIS 2008 and Created Mappings for Initial load from MS SQL server 2005 toNetezzawhile performing data cleansing.
- Converted SQL Server packages to Informatica Power Center mapping to be used withNetezza and involved in InformaticaMDMprocesses including batch based and real-time processing.
- Performed ad-hoc queries by using SQL, PL/SQL, MS Access, MS excel and UNIX to meet businessAnalyst’sneeds.
- Responsible for reviewingdatamodel, database physical design, ETL design, and Presentation layer design.
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
- Developed logging for ETL load at the package level and task level to log number of records processed by each package and each task in a package using SSIS.
- Involved in Oracle, SQL, PL/SQL, T-SQL queries programming and creating objects such as stored procedures, packages, functions, triggers, tables, and views.
- Performed match/merge and ran match rules to check the effectiveness of MDM process on data.
Environment: ERwin, MDM, ETL, Taradata, MS SQL Server 2005, PL/SQL, Netezza, DB2, Oracle, SSIS, IBM etc.
Education: Bachelors from JNTU in 2010