Data Engineer Resume
Dallas, TX
SUMMARY
- Overall 7+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Experience in cloud development architecture on Amazon AWS, EC2, Redshift and Basic on Azure.
- Experience in Work on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ and MDM.
- Excellent Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in Working Dimensional Data modeling, Star Schema/Snowflake schema, Fact & Dimensions Tables.
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
- Thorough Knowledge in creating DDL, DML and Transaction queries in SQL for Oracle and Teradata databases.
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Experience in building reports using SQL Server Reporting Services and Crystal Reports.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Proficient in handling complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT SAS/Graph, and SAS/ ODS
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Working on the ad-hoc queries, Indexing, Replication, Load balancing, Aggregation in Mongo DB
- Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.
TECHNICAL SKILLS
Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
Cloud Management: Amazon Web Services (AWS), Amazon Redshift
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
PROFESSIONAL EXPERIENCE
Confidential - Dallas, TX
Data Engineer
Responsibilities:
- As a Data Engineer, will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Worked on Software Development Life Cycle (SDLC) with good working knowledge of testing, agile methodology, disciplines, tasks, resources, and scheduling.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory.
- Demonstrated Qlikview data analyst to create custom reports, charts, and bookmarks.
- Developed data models and data migration strategies utilizing sound concepts of data modeling including star schema, snowflake schema.
- Involved in database development by creating PL/SQL Functions, Procedures, and Packages, Cursors, Error handling and views.
- Used SQL Server Reporting Services (SSRS) for database reporting in Oracle.
- Gathered and documented the Audit trail and traceability of extracted information for data quality.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Created and Configured Azure Cosmos DB Trigger in Azure Functions, which invokes the Azure Function when any changes are made to the Azure Cosmos DB container.
- Worked on MongoDB, HBase databases which differ from classic relational databases
- Executed change management processes surrounding new releases of SAS functionality
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas.
- Performed Data Analysis, Data Migration and data profiling using complex SQL on various sources systems including SQL Server.
- Used SQL Loader, external tables, and import/export toolbar to load data into Oracle.
- Designed and build scalable, maintainable ETL for Netezza and SQL Server
- Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from the container whenever the function executes.
- Analyzed the source data coming from various data sources like Mainframe & MySQL.
- Worked on analyzing source systems and their connectivity, discovery, data profiling and data mapping.
- Collected large amounts of log data using Apache Flume and aggregating using PIG/HIVE in HDFS for further analysis
- Worked on Cloud computing using Microsoft Azure with various BI Technologies and exploring NoSQL options for current back using Azure Cosmos DB (SQL API).
- Established a business analysis methodology around the RUP (Rational Unified Process). Developed use cases, project plans and manage scope.
- Implemented of Azure cloud solution using HDInsight, Event Hubs, CosmosDB, cognitive services and KeyVault.
- Developed long term data warehouse roadmap and architectures, designs and builds the data warehouse framework per the roadmap.
- Developed complex mapping to extract data from diverse sources including flat files, RDBMS tables, legacy system files, XML files, and Applications.
- Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports.
Environment: Hadoop 3.0, Agile, MS Azure, Erwin 9.7, PL/SQL, SSRS, SSIS, Hive 2.3, HDFS, NoSQL, Cosmos DB, MongoDB, XML
Confidential - Nashville, TN
Data Engineer
Responsibilities:
- Worked as a Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
- Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
- Created data models for AWS Redshift and Hive from dimensional data models.
- Wrote DDL and DML statements for creating, altering tables, and converting characters into numeric values.
- Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
- Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
- Extensively involved in development and implementation of SSIS and SSAS applications.
- Collaborated with ETL, and DBA teams to analyze and provide solutions to data issues and other challenges while implementing the OLAP model.
- Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
- Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
- Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used
- Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis.
- Executed change management processes surrounding new releases of SAS functionality
- Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
- Performed Normalization of the existing (3rd NF), to speed up the DML statements execution time.
- Participated in data collection, data cleaning, data mining, developing models and visualizations.
- Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
- Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
- Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
- Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
- Worked on analyzing and examining customer behavioral data.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
- Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy oracle and SQL server database systems.
Environment: Hadoop 3.0, SQL, PL/SQL, MDM, HDFS, HBase, SSIS, SSAS, OLAP, OLTP, AWS, T-SQL, SAS, HDFS, Sqoop, Cassandra, Hive, MDM
Confidential - Edison, NJ
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Used Agile Central (Rally) to enter tasks which has the visibility to all the team and Scrum Master.
- Connected to AWS Redshift through Tableau to extract live data for real time analysis.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases into EDW.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents
- Captured the data logs from web server into HDFS using Flume for analysis.
- Involved in PL/SQL code review and modification for the development of new requirements.
- Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
- Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic.
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
- Used SAS procedures like means, frequency and other statistical calculations for Data validation.
- Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
- Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
- Integrated various sources into the Staging area in Data warehouse to Integrating and Cleansing data.
- Cleansed, extracted, and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
Environment: Erwin, Teradata, Oracle, SQL, T-SQL, PL/SQL, AWS, Agile, OLAP, OLTP, SSIS, HDFS, SAS, Flume, SSRS, Sqoop, Map Reduce, My SQL, HDFS.
Confidential - Houston, TX
Data Analyst/Data Modeler
Responsibilities:
- Worked as a Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
- Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
- Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, and BAs.
- Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
- Extensively used of data transformation tools such as SSIS, Informatica or Data Stage.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Worked on Normalization and De-Normalization techniques for OLAP systems.
- Created Tableau scorecards, dashboards using stack bars, bar graphs, geographical maps and Gantt charts.
- Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
- Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
- Performed data mining on data using extraordinarily complex SQL queries and discovered pattern.
- Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
- Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
- Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
- Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
- Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.
Environment: E/R Studio, Teradata, SQL, PL/SQL, T-SQL, OLTP, SSIS, SSRS, OLAP, Tableau, OLTP, Netezza.
Confidential
Data Analyst
Responsibilities:
- Worked with business analyst to design weekly reports using combination of Crystal Reports.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Extensively worked in Data Analysis by querying in SQL and generating various PL/SQL objects.
- Analyzed the Business information requirements and examined the OLAP source systems to identify the measures, dimensions and facts required for the reports.
- Extensively used SQL for Data Analysis to understand and document the data behavioral trend.
- Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
- Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
- Actively participated in data cleansing and anomaly resolution of the legacy application.
- Developed and tested PL/SQL scripts and stored procedures designed and written to find specific data.
- Worked extensively on SQL querying using Joins, Alias, Functions, Triggers, and Indexes.
- Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Built dashboards using SSRS and Tableau for the business teams to take cost effective decisions.
- Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
- Worked with data warehouse concepts for data integration, data transformation and a periodic data of refreshing.
- Used Excel with VBA scripting to maintain existing and develop new reports as required by the business.
- Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.
Environment: SQL, PL/SQL, OLAP, OLTP, SSIS, T-SQL, SSRS, Tableau, MS Excel, Business Intelligence.