We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • Overall 7+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
  • Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Experience in cloud development architecture on Amazon AWS, EC2, Redshift and Basic on Azure.
  • Experience in Work on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ and MDM.
  • Excellent Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Working Dimensional Data modeling, Star Schema/Snowflake schema, Fact & Dimensions Tables.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Thorough Knowledge in creating DDL, DML and Transaction queries in SQL for Oracle and Teradata databases.
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Experience in building reports using SQL Server Reporting Services and Crystal Reports.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
  • Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Proficient in handling complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT SAS/Graph, and SAS/ ODS
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Working on the ad-hoc queries, Indexing, Replication, Load balancing, Aggregation in Mongo DB
  • Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.

TECHNICAL SKILLS

Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Cloud Management: Amazon Web Services (AWS), Amazon Redshift

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Data Engineer

Responsibilities:

  • As a Data Engineer, will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Worked on Software Development Life Cycle (SDLC) with good working knowledge of testing, agile methodology, disciplines, tasks, resources, and scheduling.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory.
  • Demonstrated Qlikview data analyst to create custom reports, charts, and bookmarks.
  • Developed data models and data migration strategies utilizing sound concepts of data modeling including star schema, snowflake schema.
  • Involved in database development by creating PL/SQL Functions, Procedures, and Packages, Cursors, Error handling and views.
  • Used SQL Server Reporting Services (SSRS) for database reporting in Oracle.
  • Gathered and documented the Audit trail and traceability of extracted information for data quality.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Created and Configured Azure Cosmos DB Trigger in Azure Functions, which invokes the Azure Function when any changes are made to the Azure Cosmos DB container.
  • Worked on MongoDB, HBase databases which differ from classic relational databases
  • Executed change management processes surrounding new releases of SAS functionality
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas.
  • Performed Data Analysis, Data Migration and data profiling using complex SQL on various sources systems including SQL Server.
  • Used SQL Loader, external tables, and import/export toolbar to load data into Oracle.
  • Designed and build scalable, maintainable ETL for Netezza and SQL Server
  • Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes.
  • Analyzed the source data coming from various data sources like Mainframe & MySQL.
  • Worked on analyzing source systems and their connectivity, discovery, data profiling and data mapping.
  • Collected large amounts of log data using Apache Flume and aggregating using PIG/HIVE in HDFS for further analysis
  • Worked on Cloud computing using Microsoft Azure with various BI Technologies and exploring NoSQL options for current back using Azure Cosmos DB (SQL API).
  • Established a business analysis methodology around the RUP (Rational Unified Process). Developed use cases, project plans and manage scope.
  • Implemented of Azure cloud solution using HDInsight, Event Hubs, CosmosDB, cognitive services and KeyVault.
  • Developed long term data warehouse roadmap and architectures, designs and builds the data warehouse framework per the roadmap.
  • Developed complex mapping to extract data from diverse sources including flat files, RDBMS tables, legacy system files, XML files, and Applications.
  • Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports.

Environment: Hadoop 3.0, Agile, MS Azure, Erwin 9.7, PL/SQL, SSRS, SSIS, Hive 2.3, HDFS, NoSQL, Cosmos DB, MongoDB, XML

Confidential - Nashville, TN

Data Engineer

Responsibilities:

  • Worked as a Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
  • Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
  • Created data models for AWS Redshift and Hive from dimensional data models.
  • Wrote DDL and DML statements for creating, altering tables, and converting characters into numeric values.
  • Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
  • Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
  • Extensively involved in development and implementation of SSIS and SSAS applications.
  • Collaborated with ETL, and DBA teams to analyze and provide solutions to data issues and other challenges while implementing the OLAP model.
  • Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used
  • Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis.
  • Executed change management processes surrounding new releases of SAS functionality
  • Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
  • Performed Normalization of the existing (3rd NF), to speed up the DML statements execution time.
  • Participated in data collection, data cleaning, data mining, developing models and visualizations.
  • Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
  • Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
  • Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
  • Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
  • Worked on analyzing and examining customer behavioral data.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
  • Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy oracle and SQL server database systems.

Environment: Hadoop 3.0, SQL, PL/SQL, MDM, HDFS, HBase, SSIS, SSAS, OLAP, OLTP, AWS, T-SQL, SAS, HDFS, Sqoop, Cassandra, Hive, MDM

Confidential - Edison, NJ

Data Analyst/Data Engineer

Responsibilities:

  • Worked as a Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Used Agile Central (Rally) to enter tasks which has the visibility to all the team and Scrum Master.
  • Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases into EDW.
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Involved in PL/SQL code review and modification for the development of new requirements.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
  • Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
  • Used SAS procedures like means, frequency and other statistical calculations for Data validation.
  • Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
  • Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
  • Integrated various sources into the Staging area in Data warehouse to Integrating and Cleansing data.
  • Cleansed, extracted, and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL

Environment: Erwin, Teradata, Oracle, SQL, T-SQL, PL/SQL, AWS, Agile, OLAP, OLTP, SSIS, HDFS, SAS, Flume, SSRS, Sqoop, Map Reduce, My SQL, HDFS.

Confidential - Houston, TX

Data Analyst/Data Modeler

Responsibilities:

  • Worked as a Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, and BAs.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
  • Extensively used of data transformation tools such as SSIS, Informatica or Data Stage.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Worked on Normalization and De-Normalization techniques for OLAP systems.
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, geographical maps and Gantt charts.
  • Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Performed data mining on data using extraordinarily complex SQL queries and discovered pattern.
  • Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
  • Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
  • Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
  • Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.

Environment: E/R Studio, Teradata, SQL, PL/SQL, T-SQL, OLTP, SSIS, SSRS, OLAP, Tableau, OLTP, Netezza.

Confidential

Data Analyst

Responsibilities:

  • Worked with business analyst to design weekly reports using combination of Crystal Reports.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Extensively worked in Data Analysis by querying in SQL and generating various PL/SQL objects.
  • Analyzed the Business information requirements and examined the OLAP source systems to identify the measures, dimensions and facts required for the reports.
  • Extensively used SQL for Data Analysis to understand and document the data behavioral trend.
  • Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
  • Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
  • Actively participated in data cleansing and anomaly resolution of the legacy application.
  • Developed and tested PL/SQL scripts and stored procedures designed and written to find specific data.
  • Worked extensively on SQL querying using Joins, Alias, Functions, Triggers, and Indexes.
  • Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Built dashboards using SSRS and Tableau for the business teams to take cost effective decisions.
  • Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
  • Worked with data warehouse concepts for data integration, data transformation and a periodic data of refreshing.
  • Used Excel with VBA scripting to maintain existing and develop new reports as required by the business.
  • Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.

Environment: SQL, PL/SQL, OLAP, OLTP, SSIS, T-SQL, SSRS, Tableau, MS Excel, Business Intelligence.

We'd love your feedback!