We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Merrimack, NH


  • Over 8 years of experience as a Sr. Big Data Engineer with skills in analysis, design, development, testing and deploying various software applications.
  • In depth knowledge of software development lifecycle (SDLC), Waterfall, Iterative and Incremental and Agile/Scrum methodologies.
  • Excellent Knowledge and working experience on big data tools like Hadoop, Azure, Data Lake, AWS Redshift.
  • Good Working with Big Data Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
  • Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio and Teradata.
  • Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
  • Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
  • Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance.
  • Implemented a distributing messaging queue to integrate with Cassandra using Apache Kafka.
  • Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
  • Extensive experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent Performing in data validation and transformation using Python and Hadoop streaming.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
  • Involved in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.


Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j.

Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c.

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports.

Project Execution Methodologies: Agile, Ralph Kimball and Bill-Inmon’s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

IDEs: Eclipse, RAD, WASD, Net Beans.

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Version Tool: VSS, SVN, CVS.


Confidential, Merrimack, NH

Sr. Big Data Engineer


  • Worked with Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data and conventional RDBMS.
  • Used the Agile Scrum methodology to build the different phases of Software development lifecycle.
  • Build Data Warehouse in Azure platform using Azure data bricks and data factory.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX and a variety of portfolios.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Developed the Conceptual Data Models, Logical Data models and transformed them to creating schema using Erwin.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
  • Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
  • Created several types of data visualizations using Python and Tableau.
  • Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
  • Involved in Design and Development of technical specifications using Hadoop technologies.
  • Developed PL/SQL programs, stored procedures for data loading and data validations.
  • Worked on Cube structure optimization for MDM query performance in Analysis Services (SSAS).
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using Erwin.
  • Performed data profiling and transformation on the raw data using Pig, and Python.
  • Developed standardized Python/Scala code and practices for ingesting data from numerous database system
  • Managed real-time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
  • Designed and deployed scalable, highly available, and fault tolerant systems on Azure.
  • Used Microsoft Excel tools like pivot tables, graphs, charts, solver to perform quantitative analysis.
  • Used windows Azure SQL reporting services to create reports with tables, charts and maps.
  • Initiated Use Case Analysis using UML, which provided the framework for potential use case deliverables and their inter-relationships.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Created various types of reports such as drill down & drill through reports, Matrix reports, Sub reports and Charts using SQL Server Reporting Services (SSRS).
  • Developed various Qlikview Data Models by extracting and using the data from various sources files Excel, Flat Files and Big data.

Environment: Agile, Azure, Hadoop 3.0, Pig 0.17, Zookeeper 3.4, Sqoop 1.4, MapReduce, Scala 2.12, Spark 2.4, HBase 1.4, Erwin 9.7, Hive 2.3, Python 3.7, NoSQL, PL/SQL, MDM

Confidential, Bentonville, AR

Data Engineer


  • As a Sr. Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in all phases of SDLC and participated in daily scrum meetings with cross teams.
  • Extensively used Agile methodology as the Organization Standard to implement the data Models.
  • Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
  • Worked on migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.
  • Created SSIS packages for data Importing, Cleansing, and Parsing etc. Extracted, cleaned and validated
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Developed an automation system to handle data related tasks using SQL, Python, Salesforce, and AWS
  • Extensively used ETL load scripts to manipulate, concatenate and clean source data.
  • Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
  • Collected large amounts of log data using Apache Flume and aggregating using Pig/Hive in HDFS for further analysis.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries.
  • Gathered and documented requirements of a Qlikview application from users.
  • Written Hive Queries for analyzing data in Hive warehouse using Hive Query Language.
  • Worked on Mongo DB, HBase databases which differ from classic relational databases
  • Involved in several facets of MDM implementations including Data Profiling, Metadata acquisition and data migration.
  • Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Wrote SQL complex queries for implementing business rules and transformations.
  • Created logical data model from the conceptual model and its conversion into the physical database design using Erwin
  • Installed and configured Pig for ETL jobs and made sure we had Pig scripts with a regular expression for data cleaning.
  • Integrated with business stake holders, gathering requirements and managing the delivery, covering the entire Tableau development life cycle.
  • Involved in database development by creating PL/SQL Functions, Procedures, and Packages, Cursors, Error handling and views.
  • Managed real-time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Performed data analysis in Hive by creating tables, loading it with data and writing Hive queries which will run internally in a MapReduce way.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.

Environment: Agile, AWS, Sqoop 1.4, Tableau, SSIS, SQL, Python 3.7, Apache Flume 1.8, Pig 0.17, Hive 2.3, Mongo DB, MDM, HDFS, Oozie 4.3, HBase 1.4, MapReduce, Teradata

Confidential, Plano, TX

Data Analyst/Data Engineer


  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Worked on Amazon Redshift and AWS a solution to load data creates data models
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Involved in logical and physical designs and transform logical models into physical implementations for Oracle and Teradata.
  • Gathered requirements and performed data mapping to understand the key information by creating tables
  • Worked on data integration and workflow application on SSIS platform and responsible for testing all new.
  • Extensively used cursors, ref cursors, User defined object types, Records in PL/SQL Programming.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Integrated NoSQL database like HBase with MapReduce to move bulk amount of data into HBase.
  • Analyzed and gather user requirements and create necessary documentation of their data migration.
  • Performed reverse engineering of the dashboard requirements to model the required data marts.
  • Evaluated data mining request requirements and help develop the queries for the requests.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved with Transact SQL (T-SQL) Coding, writing queries, cursors, functions, views, & triggers.
  • Worked on all data management activities on the project data sources, data migration.
  • Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.

Environment: AWS, Amazon Redshift, SQL, Oracle 11g, Teradata r12, SSIS, PL/SQL, Pig 0.15, Sqoop, OLAP, RDBMS, Sqoop, NoSQL, HBase 1.8, MapReduce, OLTP, Hive 2.1, T-SQL


Data Analyst/Data Modeler


  • Extensively involved in Data Modeler/Analyst role to review business requirement and compose source to target data mapping documents.
  • Conducted JAD sessions, wrote meeting minutes and also documented the requirements.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
  • Collected requirements from business users and analyzed based on the requirements.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Created Use Case Diagrams using UML to define the functional requirements of the application.
  • Involved in the data transfer creating tables from various tables, coding using PL/SQL, Stored Procedures and Packages.
  • Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Performed Data Analysis for building the reports and building Enterprise Data warehouse (EDW).
  • Designed and developed OLTP models and OLAP model for the reporting requirements using ER/Studio.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Involved in SQL Server and T-SQL in constructing Tables, Normalization and De-normalization techniques on database Tables.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of data analysis responsibilities.
  • Used E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.

Environment: 3NF, SQL, SSIS, PL/SQL, SSAS, Microsoft Visual Studio 2012, OLTP, OLAP, ER/Studio v15, T-SQL

Hire Now