We provide IT Staff Augmentation Services!

Sr. Big Data Engineer/ Hadoop Engineer Resume

Plano, TX


  • Over 9+ years of professional IT experience and expert in Requirements Gathering, designing, development, implementation and testing of Multi - tired, Distributed Applications and Web Based Applications using Big data Engineer/Data Engineer and Data Modeler/Analyst.
  • Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components MapReduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Ambari, Zoo Keeper and Oozie, Storm, Spark, Kafka.
  • Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Good experienced in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Managed ELDM Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Creating data models (ERD, logical) including robust data definitions, which may be entity-relationship-attribute models, star, and snowflake models
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Good understanding and exposure to Python programming.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.
  • Experience on developing MapReduce jobs for data cleaning and data manipulation as required for the business.
  • Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
  • Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Familiar with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2 instances, RDS and others.
  • Strong Knowledge of Data Warehouse Architecture and Star Schema, Snow flake Schema, FACT and Dimensional Tables.
  • Experience in SQL and good knowledge in PL/SQL programming and developed Stored Procedures and Triggers and Data Stage, DB2, Unix, Cognos, MDM, Hadoop, Pig.


Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Data Modeling Tools: Erwin Data Modeler 9.7/9.6, Erwin Model Manager, ER Studio v17, and Power Designer.

Programming Languages: SQL, PL/SQL, HTML5, XML and VBA.

Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.

Big Data technologies: HBase 1.2, HDFS, Sqoop 1.4, Spark, Hadoop 3.0, Hive 2.3, EC2, S3 Bucket, AMI, RDS

Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure

OLAP Tools: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Operating System: Windows, Unix, Sun Solaris

ETL/Data warehouse Tools:: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, and Pentaho.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model


Confidential - Plano, TX

Sr. Big Data Engineer/ Hadoop Engineer


  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Responsible for the entire Development Architecture within the Data Lake.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Implemented MapReduce programs to retrieve results from unstructured data set.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on and designed Big Data analytics platform for processing customer interface preferences and comments using Hadoop, Hive and Pig, Cloudera.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Importing and exporting data into HDFS and Hive using Sqoop from Oracle and vice versa.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Generated ad-hoc Tableau reports based on user requirements.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Worked with the clients on-site and provide ERP solutions based on SQL on Oracle and Microsoft SQL Servers.
  • Driving POC initiatives for finding the feasibilities of different traditional and Big data reporting tools with the data lake Spotfire BO, Tableue etc.
  • Designed and Developed Tableau reports and dashboards for data visualization using Python.
  • Experienced in Creating Store procedures and functions in SQl server to import data in to Elastic Search and converting relational data in to documents.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
  • Determined what Elastic Search queries produce the best search experience
  • Created training manuals and conduct enterprise-wide formal training on the ERP system.
  • Experienced in implementing POC's to migrate iterative MapReduce programs into Spark transformations using Scala.
  • Developed Spark scripts by using Python and Scala shell commands as per the requirement.
  • Created Data Map, registration, real time mapping, workflows, restart token and recovery process using Informatica Power Exchange 9.1.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Experienced in AWS cloud environment and on S3 storage and EC2 instances
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Worked with different types of Servers integrated with Tableau such as Amazon, Cloudera Hadoop, Oracle, and MySQL.
  • Build and produce REST service for custom Search service on Elastic Search
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Developed data warehouses, data lakes and analytics solutions using Big Data.
  • Designed and implemented SOLR indexes for the metadata that enabled internal applications to reference Scopus content.
  • Designed and Implemented End to End Search service Solution using Elastic Search.
  • Used Spark for Parallel data processing and better performances using Scala.
  • Extensively used Pig for data cleansing and extract the data from the web server output files to load into HDFS.
  • Experienced in working with different scripting technologies like Python, Unix shell scripts.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using MapReduce programs.
  • Developed simple to complex MapReduce streaming jobs using Python.

Environment: Pig 0.17, Hive 2.3, HBase 1.2, Airflow, Sqoop 1.4, Flume 1.8, Cassandra 3.11, zookeeper, AWS, MapReduce, HDFS, Oracle, Cloudera, Scala, Spark 2.3, SQL, Apache Kafka 1.0.1, Apache Storm, Python, Unix and SOLR 7.2

Confidential - Tampa, FL

Sr. Data Engineer


  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Developed reconciliation process to make sure elastic search index document count match to source records
  • Develop data pipelines to consume data from Enterprise Data Lake (MapR Hadoop distribution - Hive tables/HDFS) for analytics solution.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive tables into Netezza database.
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Developed incremental and complete load Python processes to ingest data into Elastic Search from oracle database
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Created Airflow Scheduling scripts in Python.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Loaded the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed Rest services to write data into Elastic Search index using Python Flask specifications
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark, Hive 2.3, Informatica, HDFS, Airflow, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8

Confidential - Atlanta, GA

Sr. Data Analyst/Engineer


  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Part of team conducting logical data analysis and data modeling JAD sessions, communicated data-related standards.
  • Worked on NoSQL databases including Cassandra. Implemented multi-data center and multi-rack Cassandra cluster.
  • Coordinated with Data Architects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
  • Performed Reverse Engineering of the current application using Erwin, and developed Logical and Physical data models for Central Model consolidation.
  • Translated logical data models into physical database models, generated DDLs for DBAs
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
  • Extensively used ETL methodology for supporting data extraction, transformations and loading processing, in a complex DW using Informatica.
  • Experience with various technology platforms, application architecture, design, and delivery including experience architecting large big data enterprise data lake projects.
  • Developed and maintain sales reporting using in MS Excel queries, SQL in Teradata, and MS Access.
  • Involved in writing T-SQL working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
  • Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of Data Analysis responsibilities.
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, with high volume data
  • Wrote SQL scripts to test the mappings and Developed Traceability Matrix of Business
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Created SQL tables with referential integrity, constraints and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Performed GAP analysis of current state to desired state and document requirements to control the gaps identified.
  • Developed the batch program in PL/SQL for the OLTP processing and used Unix Shell scripts to run in corn tab.
  • Identified & record defects with required information for issue to be reproduced by development team.
  • Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports

Environment: Erwin 9.0, PL/SQL, Business Objects XIR2, Informatica 8.6, Oracle 11g, Teradata R13, Teradata SQL Assistant 12.0, PL/SQL, Flat Files

Confidential - Washington, DC

Sr. Data Modeler/ Data Analyst


  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
  • Gathered all the analysis reports prototypes from the business analysts belonging to different Business units; Participated in JAD sessions involving the discussion of various reporting needs.
  • Reversed engineering the existing data marts and identified the Data Elements (in the source systems), Dimensions, Facts and Measures required for reports.
  • Conducted Design discussions and meetings to come out with the appropriate Data Warehouse at the lowest level of grain for each of the Dimensions involved.
  • Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints.
  • Designed a STAR schema for sales data involving shared dimensions (Conformed) for other subject areas using Erwin Data Modeler.
  • Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Used Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Ensured the feasibility of the logical and physical design models.
  • Worked on the Snow-flaking the Dimensions to remove redundancy.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
  • Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.
  • Created data masking mappings to mask the sensitive data between production and test environment.
  • Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
  • Used SQL tools like Teradata SQL Assistant and TOAD to run SQL queries and validate the data in warehouse.
  • Created SSIS package for daily email subscriptions to alert Tableau subscription failure using the ODBC driver and PostgreSQL database.
  • Designed logical and physical data models, Reverse engineering, Complete compare for Oracle and SQL server objects using Erwin.
  • Constructed complex SQL queries with sub-queries, inline views as per the functional needs in the Business Requirements Document (BRD).
  • Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
  • Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
  • Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.

Environment: PL/SQL, Erwin8.5, MS SQL 2008, OLTP, ODS, OLAP, OLTP, ODS, OLAP, SSIS, Tableau, ODBC, Transact-SQL, TOAD, Teradata SQL Assistant


Data Analyst


  • Gathered and translated business requirements into detailed technical specifications.
  • Developed conceptual and logical data model for the warehouse and data marts.
  • Worked with the Business Analyst, QA team in their testing and DBA for database changes, business analysis, testing and project coordination.
  • Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Integrated with developers in utilizing necessary PL/SQL scripting to perform various tasks such as validating and analyzing health care informatics data, to ensure no data anomalies pertaining to validity, content, format, presentation, etc.
  • Supported report developers by sharing the knowledge of the data models and creating materialized views as needed for reporting.
  • Ensured production data being replicated into data warehouse without any data anomalies from the processing databases.
  • Created documentation and test cases, worked with users for new module enhancements and testing.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Performed ad hoc analyses, as needed, with the ability to comprehend analysis as needed
  • Experience in creating UNIX scripts for file transfer and file manipulation.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from source systems.

Environment: Erwin, Informatica powercenter8.6.1, win XP, Oracle 9i, SQL Server, DB2, Toad for Oracle, Toad for Data Analyst, OBIEE, Rational Rose

Hire Now