We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Mt Laurel, NJ


  • Overall 9+ years of profession experience in Software Systems Development, Business Systems including designing and developing with Big Data Engineer/Data Modeler/Data Analyst.
  • Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.
  • Excellent experience in developing and designing data integration and migration solutions in MS Azure.
  • Excellent understanding and hands on experience with AWS, AWS S3 and EC2.
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inman Approach.
  • Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good experience in using SSIS and SSRS in creating and managing reports for an organization.
  • Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
  • Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
  • Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
  • Participating in requirements sessions to gather requirements along with business analysts and product owners.
  • Experience in designing a component using UML Design - Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log.
  • Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
  • Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
  • Experience in designing the Data Mart and creation of Cubes.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
  • Experience in using SSIS in solving complex business problems.
  • Proficient in writing DDL, DML commands using SQL developer and Toad.
  • Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.


Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11

Data Modeling Tools: Erwin r9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16

BI Tools: Tableau 10, SAP Business Objects, Crystal Reports

Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

Version Tool: VSS, SVN, CVS.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.


Confidential - Mt Laurel, NJ

Sr. Big Data Engineer


  • As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
  • Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
  • Integrated Oozie with MapReduce, Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
  • Collaborated with other data modeling team members to ensure design consistency and integrity.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Worked on Apache Nifi as ETL tool for batch processing and real time processing.
  • Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
  • Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Involved in developing MapReduce framework, writing queries scheduling map-reduce
  • Developed customized classes for serialization and Deserialization in Hadoop.
  • Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
  • Worked closely with business analyst for requirement gathering and translating into technical documentation.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Migrated Map Reduce jobs to Spark framework using Scala and rewritten most of the existing MapReduce jobs using Spark-Scala for performance.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Extensively used Erwin for developing data model using star schema methodologies.
  • Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Developed Spark code using Scala for faster testing and processing of data.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case.
  • Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.

Environment: Big Data, Hadoop 3.0, Agile, Apache Nifi, Hive 2.3, MapReduce, HDFS, Oracle 12c, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL.

Confidential - Lowell, AR

Data Engineer


  • Worked as a Data Engineer on several Hadoop Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Worked on migrating PIG scripts and MapReduce programs to Spark and Spark SQL to improve performance.
  • Extensively involved in writing Oracle, PL/SQL, stored procedures, functions and packages.
  • Loaded data from different source (database & files) into Hive using Talend tool.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Used all major ETL transformations to load the tables through Informatica mappings.
  • Worked on interviewing business users to gather requirements and documenting the requirements.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Developed and maintained stored procedures, implemented changes to database design including tables.
  • Ingested data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, and Hive.
  • Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
  • Involved in reports development using reporting tools like Tableau.
  • Loaded and transformed huge sets of structured, semi structured and unstructured data.
  • Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.
  • Performed performance tuning of OLTP and Data warehouse environments using SQL.
  • Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.
  • Used Oozie to orchestrate the MapReduce jobs that extract the data on a timely manner.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Implemented referential integrity using primary key and foreign key relationships.
  • Developed Staging jobs where in using data from different sources like flat files, Excel files, Oracle database

Environment: HBase, Sqoop 1.4, Oozie 4.3, Hive 2.3, Sqoop 1.4, SDLC, MapReduce, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS

Confidential - Hartford, CT

Data Analyst/Data Engineer


  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
  • Designed the HBase schemes based on the requirements and HBase data migration and validation
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Worked on moving all log files generated from various sources to HDFS for further processing
  • Wrote Hive with Scala scripts to analyze data according to business requirement.
  • Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
  • Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
  • Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed SAS macros for data cleaning, reporting and to support routing processing.
  • Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
  • Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs

Environment: Erwin 9.5, SAS, SQL, HBase, MapReduce, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.

Hire Now