We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Merrimack, NH


  • Above 9+ years of experience as Big Data Engineer/Data Modeler/Data Analyst with skills in analysis, design, development, testing and deploying various software applications.
  • In depth knowledge of software development life cycle (SDLC), Waterfall, Iterative and Incremental, RUP, evolutionary prototyping and Agile/Scrum methodologies
  • Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Experience with data modeling and design of both OLTP and OLAP systems.
  • Good Working with Big Data Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
  • Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
  • Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
  • Experience working with Microsoft Server tools like SSAS, SSIS and in generating on-demand scheduled reports using SQL Server Reporting Services (SSRS).
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Proficient Knowledge on creating dashboards/reports using reporting tools like Tableu, Qlikview.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Implementing a distributing messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Excellent Performing in data validation and transformation using Python and Hadoop streaming.
  • Well experience in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
  • Excellent understanding of Microsoft BI toolset including Excel, Power BI, SQL Server Analysis Services, Visio, Access.
  • Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata and BTEQ proficient in developing Entity-Relationship diagrams, Star/Snow Flake schema designs
  • Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
  • Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
  • Hands on experience in importing, cleaning, transforming, and validating data and making conclusions from the data for decision-making purposes.


Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Cassandra 3.11

Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c

Databases: Oracle 12c, DB2, SQL Server.

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

Project Execution Methodologies: Agile, Ralph Kimball and BillInmon s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Version Tool: VSS, SVN, CVS.


Confidential, Merrimack, NH

Sr. Big Data Engineer


  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Participated in all aspects of Software Development Life Cycle (SDLC) and Production troubleshooting, Software testing using Standard Test Tool.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Primarily involved in Data Migration process using Azure by integrating with Bit bucket repository
  • Conducted JAD sessions with management, vendors, users and other stakeholders for open and pending issues to develop specifications.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Utilized Integration Services (SSIS) to produce a Data Mapping and Data Mart for reporting.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Developed Oozie workflow jobs to execute hive Sqoop and MapReduce actions.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS
  • Designed high level ETL architecture for overall data transfer from the OLTP to OLAP with the help of SSIS.
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Developed logical/ physical data models in Erwin across the subject areas based on the specifications and established referential integrity of the system.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL/Cassandra technology.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Managed database design and implemented a comprehensive Star-Schema with shared dimensions.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in Creation of Microsoft Azure Cloud SQL Servers and Replication Severs.
  • Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
  • Collected and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Created views and extracted data from Teradata base tables and uploaded data to oracle staging server from Teradata tables, using fast export concept.
  • Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
  • Created ad-hoc reports with sensitive data pulled from Microsoft excel while mining more than 40,000 lines of data per report.

Environment: Hadoop 3.0, HDFS, Agile, Apache Hive 2.3, MapReduce, Oracle 12c, Spark 2.3, HBase 1.2, Flume 1.8, Apache Pig 0.17, Sqoop 1.4, Oozie 4.3, PL/SQL, SSIS, SSRS, Teradata r15, SQL, OLTP, OLAP, ETL, Tableau

Confidential, Harrisburg, PA

Data Engineer


  • As a Sr. Data Engineer, you will provide aptitude to Big Data technologies as they relate to the development of analytics.
  • Used SDLC (System Development Life Cycle) methodologies like RUP and Agile methodology.
  • Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need.
  • Developed MapReduce modules for machine learning & predictive analytics in Hadoop on AWS.
  • Created and executed SQL scripts to validate, verify and compare the source data to target table data.
  • Used Sqoop to import and export data into Hadoop distributed file system for further processing.
  • Provided technical support during delivery of MDM (Master Data Management) components.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for increasing performance benefit and helping in organizing data in a logical fashion.
  • Implemented python scripts to parse XML documents and load the data in the databases.
  • Installed and configured Big Data ecosystem like HBase, Flume, Pig and Sqoop.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS
  • Reviewed requirements together with QA Manager, ETL leads to enhancing the data warehouse for the originations systems and servicing systems.
  • Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on Amazon Redshift and AWS and architecting a solution to load data, create data models.
  • Loaded multiple NOSQL databases including MongoDB, HBase and Cassandra.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
  • Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users.
  • Performed Data Analysis on both source data and target data after transfer to Data Warehouse.
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Created HBase tables to store various data formats of PII data coming from different portfolios
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Designed and Developed Spark workflows using Scala for data pull from cloud-based systems and applying transformations on it.
  • Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
  • Worked with the analysis teams and management teams and supported them based on their requirements.

Environment: Agile, AWS, Hadoop 3.0, MapReduce, SQL, MDM, Hive 2.3, XML, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, HDFS, Cassandra 3.11, Kafka, NOSQL, MongoDB, Oozie 4.3, Tableau, UNIX

Confidential, Greensboro, NC

Data Analyst/Data Engineer


  • Worked as a Sr. Data Analyst/Data Engineer I was responsible for all data related aspects of a project.
  • Participated in different phases of projects starting from Business Walk through for requirements gathering, business analysis, Design and Coding Testing
  • Used reverse engineering to connect to existing database and create graphical representation (E-R diagram).
  • Successfully migrated Legacy application to Big Data application using Hive/Pig in Production level.
  • Implemented MapReduce programs by joining data sets from different sources using joins.
  • Performed GAP analysis to identify the gap between the optimized allocation and integration of the inputs, and the current level of allocation.
  • Developed Data Migration and Cleansing rules for the Integration Architecture using OLTP.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Modeled Hive partitions extensively for data separation and faster data processing and followed
  • Involved in Reverse engineering on existing Data model to understand the data flow and business flow
  • Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Designed of Redshift Data model, Redshift Performance improvements/analysis
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Assisted project with analytical techniques including data modeling, data mining techniques, regression and hypothesis to get output from large data sets.
  • Imported and exporting the stored web log data into HDFS and Hive using Scoop.
  • Created and maintain the metadata (data dictionary) for the data models.
  • Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Developed data dictionaries and layouts according to the client/vendor requirements and worked with system analysts parallelly for data mapping accordingly.
  • Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
  • Worked with MS Access Database and generated several reports in Microsoft Excel
  • Facilitated meetings with the business and technical team to gather necessary analytical data requirements.

Environment: AWS, Oracle 11g, OLTP, OLAP, HDFS, Apache Hive 2.1, PL/SQL, Sqoop, SQL, DDL, DML, MS Excel 2014.

Confidential, Newport Beach, CA

Data Analyst/Data Modeler


  • Worked as a Data Analyst /Data Modeler to generate Data Models using E/R Studio.
  • Actively participated in JAD sessions involving the discussion of various reporting needs.
  • Gathered business requirements from the users and transformed and implemented into database schemas.
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Designed OLTP system environment and maintained documentation of Metadata.
  • Conducted detailed analysis of the data issue, mapping data from source to target, design and data cleansing on the Data Warehouse
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Performed Normalization of the existing OLTP systems (3rd NF), to speed up the DML statements execution time.
  • Worked on designing a Star schema for the detailed data marts and plan data marts involving confirmed dimensions.
  • De-normalized the database to put them into the star schema of the data warehouse for specific requirements of the projects.
  • Involved in dimensional modeling, identifying the Facts and Dimensions for reporting purposes.
  • Worked in SQL Server Integration Services (SSIS) with good worked on SQL Server Analysis Services (SSAS).
  • Managed to scheduled data refresh on Tableau server for weekly and monthly increments based on business changes, which updated on the dashboard.
  • Extracted the source data from Oracle tables, MS SQL Server, sequential files and Excel sheets.
  • Extensively used SAS procedures like means, frequency and other statistical calculations for Data validation.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Worked with Business Analyst during requirements gathering and business analysis to prepare high level Logical Data Models and Physical Data Models using E/R Studio.
  • Conducted Design discussions and meetings to come out with the appropriate Data Mart using Kimball Methodology.
  • Developed a solution which will aid in the data capture, data cleansing, data monitoring and reporting of customer data.
  • Created Technical specifications documents for the data warehouse design and data mapping from the source to target applying the business rules.
  • Evaluated data mining request requirements and help develop the queries for the requests.
  • Analyzed complex data sets, performing ad-hoc analysis and Data Manipulation to support retail Campaigns.
  • Performed different calculations like Quick table calculations, Date Calculations, Aggregate Calculations, String and Number Calculations.

Environment: ER/Studio V16, PL/SQL, SAS, UML, SQL, SSIS, SSRS, Tableau 8.2, AWS, Data Mart, MS SQL Server


Data Analyst


  • Performed Data Analysis on both source data and target data after transfer to Data Warehouse.
  • Extracted, discussed, and refined business requirements from business users and team members.
  • Developed complex PL/SQL procedures and packages using views and SQL joins.
  • Performed data profiling and analysis applied various data cleansing rules designed data standards and designed the relational models.
  • Documented data dictionaries and business requirements for key workflows and process points
  • Conducted data extraction, data manipulation over large relational data sets using SQL and built multiple linear regressions.
  • Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, SAS, Tableau and more.
  • Enhanced data collection procedures to include information that is relevant for building analytic systems.
  • Involved in identifying the Data requirements and creating Data Dictionary for the functionalities
  • Extensively used the Set, Update and Merge statements for creating, updating and merging various SAS data sets
  • Performed various analysis operations on data such as Univariate, Multivariate, Time Series analysis, Regression analysis and Correlation Analysis.
  • Interpreted and converted contractual documents with the Data clients to Stored Procedures, User Defined Functions, Views, T-SQL Scripting in SQL server reporting services.
  • Define data needs, evaluate data quality, and extract/transform data for analytic projects and research.
  • Created multiple table views and reports generated through Power BI and Tableau for business analysis
  • Performed Data Validation / Data Reconciliation between disparate source and target systems for various projects.
  • Emphasized on Optimization techniques using triggers, indexes and partitions for high volume transactions.
  • Advanced- level excel skills V-lookups, Macros, Conditional formatting, Pivot Tables, Summarizing the data in Excel
  • Performed various ad-hoc analyses by extracting data from multiple source systems and creating comprehensive reports for end users.

Environment: R, SAS, Tableau 5.2, SQL, PL/SQL, Python, MS Excel 2010, Pivot Tables, Business Intelligence.

Hire Now