We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • Overall 7+ years of experience as Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Full life cycle implementation experience of Big Data Pipelines.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
  • Extensive knowledge of Bigdata, Hadoop, MapReduce, Hive, NoSQL Databases and other emerging technologies.
  • Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
  • Experience in creating tables, constraints, views, and materialized views using ERwin, ER Studio, and SQL Modeler.
  • Extensive experience in Text Analytics, generating data visualizations using Python and creating dashboards using tools like Tableau.
  • Data streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark.
  • Hands on experience in Normalization and De - Normalization techniques for optimum performance in relational and dimensional database environments.
  • Good experience in AGILE delivery process of software using SCRUM.
  • Excellent SQL programming skills and developed Stored Procedures, Triggers, Functions, Packages using SQL, PL/SQL.
  • Excellent Knowledge of Ralph Kimball and BillInmon's approaches to Data Warehousing.
  • Experience in working on Distributed storage for analysis and processing of large data sets using Apache Hadoop.
  • Expert in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Experience in working with Teradata. And making the data to be batch processing using distributed computing.
  • Excellent knowledge and extensively using NOSQL databases (HBase).
  • Experience in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
  • Hands on experience on data modeling with Star schema and Snowflake schema.
  • Using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS).
  • Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

TECHNICAL SKILLS

Big Data Tools: HBase 1.2, Hive 2.3, Pig 0.17, HDFS, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop3.0, Spark

Methodologies: JAD, System Development Life Cycle (SDLC), Agile, Waterfall Model.

ETL Tools: Informatica 9.6/9.1 and Tableau.

Data Modeling Tools: Erwin Data Modeler 9.8, ER Studio v17, and Power Designer 16.6.

Databases: Oracle 12c, Teradata R15, MS SQL Server 2016, DB2.

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, Python, UNIX shell Scripting

Operating System: Windows, Unix

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Data Engineer

Responsibilities:

  • Worked on Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
  • Involved in all the phases of SDLC including Requirements Collection, Design & Analysis of the Customer Specifications from Business Analyst.
  • Developed a data pipelines using Kafka and Storm to store data into HDFS and performed the real time analytics on the incoming data.
  • Worked on reading multiple data formats on HDFS using python.
  • Performed Reverse Engineering of the current application using Erwin, and developed Logical and Physical data models for Central Model consolidation.
  • Implemented Custom Azure Data Factory pipeline Activities and SCOPE scripts.
  • Created SQL tables with referential integrity, constraints and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Created Data Dictionary and Data Mapping from Sources to the Target in MDM Data Model.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Imported the complete data from RDBMS to HDFS cluster using Sqoop.
  • Created customized report using OLAP Tools such as Crystal Report for business use.
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema
  • Worked on claims data and extracted data from various sources such as flat files, Oracle and Mainframes.
  • Created Complex SQL Queries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions worked with different methods of logging in SSIS.
  • Used the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
  • Designed and build a DataLake using Hadoop and its ecosystem components.
  • Created a new data model that embeds NoSQl submodels within a relational data model by applying Hybrid data modelling concepts.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Actively involved in SQL and Azure SQL DW code development using T-SQL
  • Implemented Spark using Python ( pySpark ) and SparkSQL for faster testing and processing of data.
  • Handled structured and unstructured data and applying ETL processes.
  • Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
  • Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Successfully Generated consumer group lags from Kafka using their API
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
  • Defined job work flows as per their dependencies in Oozie.
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Developed the batch program in PL/SQL for the OLTP processing and used Unix Shell scripts to run in corn tab.

Environment: Erwin9.8,, Big data3.0, ETL, Hadoop3.0, NoSQl, SQL, PL/SQL, Azure, HDFS, Python, Kafka1.1, OLAP, Oracle12c, Sqoop1.4, SSIS, PySpark, T-SQL, HIVE2.3, Hbase1.2, SSRS, API, Oozie4.3, Cosmos, Tableau, Map Reduce, OLTP.

Confidential - Houston, TX

Data Analyst/Data Engineer

Responsibilities:

  • Accomplished implementation of Data Pipelines as per the data models.
  • Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
  • Configured AWS EC2 instances, S3Buckets, Cloud services and architected the flow of data to and from AWS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Skilled experience in Python with proven expertise in using new tools and technical developments
  • Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
  • Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
  • Loaded real time data from various data sources into HDFS using Kafka.
  • Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Worked in generating and documenting Metadata while designing OLTP and OLAP systems environment.
  • Assisted in the oversight for compliance to the Enterprise Data Standards, data governance and data quality.
  • Carried out effective data profiling to eradicate anomalies between source and target data.
  • Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database.
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
  • Developed prototype solutions to verify capabilities for new systems development, enhancement, and maintenance of MDM
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Performed Tableau administering by using tableau admin commands.
  • Developed all the required stored procedures, user defined functions and triggers using T-SQL and SQL.
  • Worked with Looker, ESB(Enterprise Service Bus), API, AWS EMR, Ranger, and Hadoop technologies
  • Assisted in the oversight for compliance to the Enterprise Data Standards, data governance and data quality.
  • Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.
  • Created/Modified shell scripts for scheduling various data cleansing scripts and ETL loading process.

Environment: Erwin9.8, Hadoop3.0, Agile, Oracle12c, SQL, PL/SQL, Teradata15, AWS, Oozie4.3, Hive2.3, Pig0.17, OLAP, OLTP, Kafka1.1, HDFS, ETL, Tableau, T-SQL, SSIS, SSRS.

Confidential - Nashville, TN

Data Analyst/Data Modeler

Responsibilities:

  • Performed Data Analysis on the source data in order to understand the relationship between the entities
  • Participated in JAD sessions, gathered information from Business Analysts, end users and other stakeholders to determine the requirements.
  • Created dimensional model for the reporting system by identifying required dimensions and facts using ER /Studio
  • Designed logical and physical data models using data provisioning and consumption techniques.
  • Used existing Deal Model in Python to inherit and create object data structure for regulatory reporting.
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Used Teradata Fast Export utility to export large volumes of data from Teradata tables and views for processing and reporting needs.
  • Extensively created SSIS packages to clean and load data to data warehouse.
  • Used SQL for extract, transfer and load ETL methodology and processes
  • Created PL/SQL procedures, triggers, generated application data, Created users and privileges, used oracle utilities import/export.
  • Translated business concepts into XML vocabularies by designing XML Schemas with UML
  • Used normalization and de-normalization techniques to achieve optimum performance of the database.
  • Designed and developed of data warehouse using T-SQL, SQL.
  • Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on scheduling of the reports.
  • Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
  • Created Data Dictionaries, Source to Target Mapping Documents and documented Transformation rules for all the fields.
  • Developed Ad-hoc reports using Tableau Desktop, Excel.

Environment: ER /Studio, SQL, PL/SQL, OLAP, OLTP, Python, Teradata, ETL, SSIS, XML, T-SQL, SSRS, Tableau, Excel, Oracle.

Confidential

Data Analyst

Responsibilities:

  • Worked in data management performing data analysis, gap analysis, and data mapping.
  • Worked on two sources to bring in required data needed for reporting for a project by writing SQL extracts
  • Created stored procedures using PL/SQL and tuned the databases and backend process.
  • Worked on debugging and identifying the unexpected real-time issues in the production server SSIS packages.
  • Conducted GAP analysis and data mapping to derive requirements for existing systems enhancements for a project.
  • Involved in extensive DATA validation using SQL queries and back-end testing
  • Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica)
  • Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data and developed SQL Stored procedures to query dimension and fact tables in data warehouse.
  • Worked with SQL Server Reporting Services (SSRS) to author, manage, and deliver both paper- based
  • Prototyped data visualizations using Charts, drill-down, parameterized controls using Tableau to highlight the value of analytics in Executive decision support control.
  • Performed data mining on Claims data using very complex SQL queries and discovered claims pattern.
  • Developed Enterprise Data Dictionary for reusable objects like domains, attachments, defaults, reference values, User data types and reusable procedural logic.
  • Written complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Extensively created tables and queries to produce additional ad-hoc reports.
  • Performed data validation on the flat files that were generated in UNIX environment using UNIX commands as necessary.

Environment: SQL, PL/SQL, SSIS, SSRS, T-SQL, Informatica, Tableau, XML, UNIX.

We'd love your feedback!