Sr. Data Engineer Resume
SUMMARY
- About 8+ years of professional experience in IT, working with various Legacy Database systems, which includes work experience in Big Data technologies as well.
- Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies like Agile and hybrid methods.
- Expert in using Python for Data Engineering and Modeling
- Vast experience of working in the area of data management including data analysis, gap analysis and data mapping.
- Excellent knowledge on Confidential Azure Services, Amazon Web Services and Management.
- Transforming and retrieving the data by using Pig, Hive, SSIS and Map Reduce.
- Extensive experience on interaction with system users in gathering business requirements and involved in developing projects.
- Data importing and exporting by using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Excellent knowledge and extensively using NOSQL databases (HBase).
- Excellent Knowledge of RalphKimball and BillInmon's approaches to Data Warehousing.
- Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
- Strong background in various Data modeling tools using ERWIN, ER Studio and Power Designer.
- Solid in-depth understanding of Information security concepts, Data modeling and RDBMS concepts.
- Hands on experience on writing Queries, Stored procedures, Functions and Triggers.
- Good knowledge on converting complex RDMS (Oracle, MySQL & Teradata) queries into Hive query language.
- Good knowledge in OLAP, OLTP, BusinessIntelligence and Data Warehousing concepts with emphasis on ETL and Business Reporting needs.
- Develop generic SQL Procedures and Complex T-SQL statements to achieve the reports generation.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Hands on experience on data modeling with Star schema and Snowflake schema.
- Perform structural modifications using Map - Reduce, HIVE and analyze data using visualization/ reporting tools.
- Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
- Experience in extracting source data from Sequential files, XML files, CSV files, transforming and loading it into the target Data warehouse.
- Excellent knowledge of MicrosoftOffice with an emphasis on Excel.
- Experience in Data mining with querying and mining large datasets to discover transition patterns and examine financial reports.
- An excellent experience in generating ad-hoc repots using Crystal Report
- Hands on experience for Data Integrity process and Data Modelling concepts.
- Effectively Plan and Manage project deliverable with on-site and offshore model and improve the client satisfaction.
TECHNICAL SKILLS
Big Data Tools: HBase 1.2, Hive 2.3, Pig 0.17, HDFS, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0
Data Modeling Tools: Erwin Data Modeler 9.8, ER Studio v17, and Power Designer 16.6.
Databases: Oracle 12c, Teradata R15, MS SQL Server 2016, DB2.
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, Python, Spark.
Methodologies: JAD, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Operating System: Windows, UnixETL Tools: Informatica 9.6/9.1 and Tableau.
PROFESSIONAL EXPERIENCE
Confidential
Sr. Data Engineer
Responsibilities:
- As a Sr. Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Implemented data pipelines using python code
- Followed solely SDLC methodologies during the course of the project.
- Participated in JAD sessions for design optimizations related to data structures as well as ETL processes
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
- Developed a data model (star schema) for the sales data mart using Erwin tool.
- Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables
- Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
- Performed several ad-hoc data analysis in Azure Data bricks Analysis Platform on KANBAN board.
- Used Azure reporting services to upload and download reports
- Worked on debugging and identifying the unexpected real-time issues in the production server SSIS packages.
- Loaded real time data from various data sources into HDFS using Kafka.
- Developed Map Reduce jobs for Data Cleanup in Python.
- Defined extract - translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
- Participated in integration of MDM (Master Data Management) Hub and data warehouses.
- Written pig script to load processed data from HDFS into MongoDB.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
- Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
- Worked on reading multiple data formats on HDFS using python.
- Involved in database development by creating Oracle PL/SQL Functions, Procedures and Collections.
- Worked in Oozie and workflow scheduler to manage hadoop jobs with control flows.
- Prepared Tableau reports and dashboards with calculated fields, parameters, sets, groups or bins and publish on the server.
- Translated business requirements into SAS code for use within internal systems and models
- Migrated of ETL processes from RDBMS to Hive to test the easy data manipulation.
Environment: Erwin9.8, Big Data3.0, Hadoop3.0, Azure, SQL, Sqoop1.4, ETL, HDFS, Kafka1.1, Python, Map Reduce, SSIS, MDM, Pig0.17, MongoDB, Hive2.3, HBase1.2, Oracle12c, PL/SQL, Tableau, Oozie4.3 SAS.
Confidential - Nashville, TN
Data Engineer / Data Modeler
Responsibilities:
- As a Sr. Data Engineer I was responsible for all data management related aspects of a project.
- Used the Agile Scrum methodology to build the different phases of Software development life cycle.
- Key player in maintaining data pipelines and flows
- Conducted JAD sessions, wrote meeting minutes and also documented the requirement
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS
- Imported data from RDBMS to HDFS and Hive using Sqoop on regular basis.
- Performed Data Analysis using procedures and functions in PL/SQL.
- Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database.
- Wrote T-SQL procedures to generate DML scripts that modified database objects dynamically based on user inputs.
- Worked on Normalization and De-Normalization techniques for both OLTP systems.
- Analyzed data using Hadoop components Hive and Pig.
- Extensively designed and developed the HBase target schema.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Generated DDL statements for the creation of new ER/studio objects like table, views, indexes, packages and stored procedures.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from Teradata database.
- Used reverse engineering to connect to existing database and create graphical representation (E-R diagram).
- Managed database design and implemented a comprehensive Snow flake-Schema with shared dimensions.
- Created Hive tables to store data and written Hive queries.
- Designed Mapping Documents and Mapping Templates for ETL developers
- Analyzed the SQL scripts and designed the solution to implement using python.
- Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT).
- Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
- Presented the data scenarios via, ER/Studio logical models and excel mockups to visualize the data better.
- Created graphical representation in the form of Entity Relationships and elicit more information.
Environment: ER/studio, Agile, Hadoop3.0, AWS, HDFS, SQL, PL/SQL, Hive2.3, Sqoop1.4, T-SQL, OLTP, Pig0.17, HBase1.2, Teradata15, ETL, Oracle12c.
Confidential
Data Analyst/Engineer
Responsibilities:
- Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift
- Used Agile Method for daily scrum to discuss the project related information.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems
- Implemented Dimensional model for the Data Mart and responsible for generating DDL scripts using Erwin.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
- Created SQL scripts for database modification and performed multiple data modeling tasks at the same time under tight schedules.
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data
- Created Mappings, Mapplets, Sessions and Workflows using Informatica to replace the existing Stored Procedures
- Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management.
- Designed and created Data Marts as part of a data warehouse.
- Developed Hive queries to process the data for visualizing.
- Repopulated the static tables in the Data warehouse using PL/SQL procedures and SQL Loader.
- Translated business concepts into XML vocabularies by designing XML Schemas with UML
- Involved in the design and development of user interfaces and customization of Reports using Tableau.
- Created the data model for the Subject Area in the Enterprise Data Warehouse (EDW).
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
- Performed dicing and slicing on data using Pivot tables to acquire the churn rate pattern and prepared reports as required.
Environment: Erwin9.7, Redshift, Agile, MDM, Oracle12c, SQL, HBase1.1, UNIX, NoSQL, OLAP, OLTP, SSIS, Informatica, HDFS, Hive, XML, PL/SQL, Tableau, SSRS.
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Effectively involved in Data Analyst/Data Modeler role to review business requirement and compose source to target data mapping documents.
- Used Sybase Power Designer tool for relational database and dimensional data warehouse designs.
- Extensively used Star Schema methodologies in building and designing the logical data model and Physical Data Model into Dimensional Models
- Designed and developed databases for OLTP and OLAP Applications.
- Used SQL Profiler and Query Analyzer to optimize DTS package queries and stored procedures.
- Created SSIS packages to populate data from various data sources.
- Extensively performed Data analysis using Python Pandas.
- Extensively involved in the modeling and development of Reporting Data Warehousing System.
- Developed reports and visualizations using Tableau Desktop as per the requirements.
- Extensively used Informatica tools- Source Analyzer, Warehouse Designer, Mapping Designer.
- Involved in migration and Conversion of Reports from SSRS.
- Migrated data from SQL Server to Oracle and uploaded as XML-Type in Oracle Tables.
- Performed Data Dictionary mapping, extensive database modeling (Relational and Star schema) utilizing Sybase Power Designer.
- Involved in performance tuning and monitoring of both T-SQL and PL/SQL blocks.
- Worked on creating DDL, DML scripts for the data models.
- Created volatile and global temporary tables to load large volumes of data into Teradata database
- Involve in designing Business Objects universes and creating reports
- Provided production support to resolve user issues for applications using Excel VBA
- Worked with Tableau in analysis and creation of dashboard and user stories.
- Involved in extensive Data validation using SQL queries and back-end testing
- Performed GAP analysis of current state to desired state and document requirements to control the gaps identified.
Environment: Sybase Power Designer, SQL, PL/SQL, Teradata14, Oracle11g, XML, Tableau, OLAP, OLTP, SSIS, Informatica, SSRS, Python, T-SQL, Excel.