Data Engineer Resume
Dallas, TX
SUMMARY
- Overall 7+ years of experience as Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Full life cycle implementation experience of Big Data Pipelines.
- Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
- Extensive knowledge of Bigdata, Hadoop, MapReduce, Hive, NoSQL Databases and other emerging technologies.
- Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Experience in creating tables, constraints, views, and materialized views using ERwin, ER Studio, and SQL Modeler.
- Extensive experience in Text Analytics, generating data visualizations using Python and creating dashboards using tools like Tableau.
- Data streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark.
- Hands on experience in Normalization and De - Normalization techniques for optimum performance in relational and dimensional database environments.
- Good experience in AGILE delivery process of software using SCRUM.
- Excellent SQL programming skills and developed Stored Procedures, Triggers, Functions, Packages using SQL, PL/SQL.
- Excellent Knowledge of Ralph Kimball and BillInmon's approaches to Data Warehousing.
- Experience in working on Distributed storage for analysis and processing of large data sets using Apache Hadoop.
- Expert in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Experience in working with Teradata. And making the data to be batch processing using distributed computing.
- Excellent knowledge and extensively using NOSQL databases (HBase).
- Experience in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
- Hands on experience on data modeling with Star schema and Snowflake schema.
- Using MS Excel and MS Access to dump the data and analyze based on business needs.
- Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS).
- Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.
TECHNICAL SKILLS
Big Data Tools: HBase 1.2, Hive 2.3, Pig 0.17, HDFS, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop3.0, Spark
Methodologies: JAD, System Development Life Cycle (SDLC), Agile, Waterfall Model.
ETL Tools: Informatica 9.6/9.1 and Tableau.
Data Modeling Tools: Erwin Data Modeler 9.8, ER Studio v17, and Power Designer 16.6.
Databases: Oracle 12c, Teradata R15, MS SQL Server 2016, DB2.
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, Python, UNIX shell Scripting
Operating System: Windows, Unix
PROFESSIONAL EXPERIENCE
Confidential - Dallas, TX
Data Engineer
Responsibilities:
- Worked on Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
- Involved in all the phases of SDLC including Requirements Collection, Design & Analysis of the Customer Specifications from Business Analyst.
- Developed a data pipelines using Kafka and Storm to store data into HDFS and performed the real time analytics on the incoming data.
- Worked on reading multiple data formats on HDFS using python.
- Performed Reverse Engineering of the current application using Erwin, and developed Logical and Physical data models for Central Model consolidation.
- Implemented Custom Azure Data Factory pipeline Activities and SCOPE scripts.
- Created SQL tables with referential integrity, constraints and developed queries using SQL, SQL*PLUS and PL/SQL.
- Created Data Dictionary and Data Mapping from Sources to the Target in MDM Data Model.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Imported the complete data from RDBMS to HDFS cluster using Sqoop.
- Created customized report using OLAP Tools such as Crystal Report for business use.
- Developed data Mart for the base data in Star Schema, Snow-Flake Schema
- Worked on claims data and extracted data from various sources such as flat files, Oracle and Mainframes.
- Created Complex SQL Queries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions worked with different methods of logging in SSIS.
- Used the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
- Designed and build a DataLake using Hadoop and its ecosystem components.
- Created a new data model that embeds NoSQl submodels within a relational data model by applying Hybrid data modelling concepts.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Actively involved in SQL and Azure SQL DW code development using T-SQL
- Implemented Spark using Python ( pySpark ) and SparkSQL for faster testing and processing of data.
- Handled structured and unstructured data and applying ETL processes.
- Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
- Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
- Successfully Generated consumer group lags from Kafka using their API
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
- Defined job work flows as per their dependencies in Oozie.
- Developed Map Reduce Programs for data analysis and data cleaning.
- Developed the batch program in PL/SQL for the OLTP processing and used Unix Shell scripts to run in corn tab.
Environment: Erwin9.8,, Big data3.0, ETL, Hadoop3.0, NoSQl, SQL, PL/SQL, Azure, HDFS, Python, Kafka1.1, OLAP, Oracle12c, Sqoop1.4, SSIS, PySpark, T-SQL, HIVE2.3, Hbase1.2, SSRS, API, Oozie4.3, Cosmos, Tableau, Map Reduce, OLTP.
Confidential - Houston, TX
Data Analyst/Data Engineer
Responsibilities:
- Accomplished implementation of Data Pipelines as per the data models.
- Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
- Configured AWS EC2 instances, S3Buckets, Cloud services and architected the flow of data to and from AWS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Skilled experience in Python with proven expertise in using new tools and technical developments
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
- Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
- Loaded real time data from various data sources into HDFS using Kafka.
- Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
- Worked in generating and documenting Metadata while designing OLTP and OLAP systems environment.
- Assisted in the oversight for compliance to the Enterprise Data Standards, data governance and data quality.
- Carried out effective data profiling to eradicate anomalies between source and target data.
- Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database.
- Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
- Developed prototype solutions to verify capabilities for new systems development, enhancement, and maintenance of MDM
- Developed PIG Latin scripts for the analysis of semi structured data.
- Performed Tableau administering by using tableau admin commands.
- Developed all the required stored procedures, user defined functions and triggers using T-SQL and SQL.
- Worked with Looker, ESB(Enterprise Service Bus), API, AWS EMR, Ranger, and Hadoop technologies
- Assisted in the oversight for compliance to the Enterprise Data Standards, data governance and data quality.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.
- Created/Modified shell scripts for scheduling various data cleansing scripts and ETL loading process.
Environment: Erwin9.8, Hadoop3.0, Agile, Oracle12c, SQL, PL/SQL, Teradata15, AWS, Oozie4.3, Hive2.3, Pig0.17, OLAP, OLTP, Kafka1.1, HDFS, ETL, Tableau, T-SQL, SSIS, SSRS.
Confidential - Nashville, TN
Data Analyst/Data Modeler
Responsibilities:
- Performed Data Analysis on the source data in order to understand the relationship between the entities
- Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.
- Created dimensional model for the reporting system by identifying required dimensions and facts using ER /Studio
- Designed logical and physical data models using data provisioning and consumption techniques.
- Used existing Deal Model in Python to inherit and create object data structure for regulatory reporting.
- Handled performance requirements for databases in OLTP and OLAP models.
- Used Teradata Fast Export utility to export large volumes of data from Teradata tables and views for processing and reporting needs.
- Extensively created SSIS packages to clean and load data to data warehouse.
- Used SQL for extract, transfer and load ETL methodology and processes
- Created PL/SQL procedures, triggers, generated application data, Created users and privileges, used oracle utilities import/export.
- Translated business concepts into XML vocabularies by designing XML Schemas with UML
- Used normalization and de-normalization techniques to achieve optimum performance of the database.
- Designed and developed of data warehouse using T-SQL, SQL.
- Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on scheduling of the reports.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
- Created Data Dictionaries, Source to Target Mapping Documents and documented Transformation rules for all the fields.
- Developed Ad-hoc reports using Tableau Desktop, Excel.
Environment: ER /Studio, SQL, PL/SQL, OLAP, OLTP, Python, Teradata, ETL, SSIS, XML, T-SQL, SSRS, Tableau, Excel, Oracle.
Confidential
Data Analyst
Responsibilities:
- Worked in data management performing data analysis, gap analysis, and data mapping.
- Worked on two sources to bring in required data needed for reporting for a project by writing SQL extracts
- Created stored procedures using PL/SQL and tuned the databases and backend process.
- Worked on debugging and identifying the unexpected real-time issues in the production server SSIS packages.
- Conducted GAP analysis and data mapping to derive requirements for existing systems enhancements for a project.
- Involved in extensive DATA validation using SQL queries and back-end testing
- Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
- Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data and developed SQL Stored procedures to query dimension and fact tables in data warehouse.
- Worked with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper- based
- Prototyped data visualizations using Charts, drill-down, parameterized controls using Tableau to highlight the value of analytics in Executive decision support control.
- Performed data mining on Claims data using very complex SQL queries and discovered claims pattern.
- Developed Enterprise Data Dictionary for reusable objects like domains, attachments, defaults, reference values, User data types and reusable procedural logic.
- Written complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
- Extensively created tables and queries to produce additional ad-hoc reports.
- Performed data validation on the flat files that were generated in UNIX environment using UNIX commands as necessary.
Environment: SQL, PL/SQL, SSIS, SSRS, T-SQL, Informatica, Tableau, XML, UNIX.
