We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Greensboro, NC


  • Over 6+ years of work experience in IT as a Big Data Engineer, AWS Data Engineer and Programmer Analyst.
  • Excellent understanding of Hadoop architecture and sunderlying framework including storage management.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Good Knowledge on big data tools like Hadoop, Azure Data Lake, and AWS Redshift.
  • Hands on experience in Normalization and Demoralization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
  • Good experience in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, and XML files.
  • Solid knowledge of Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow - Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import
  • Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
  • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
  • Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
  • Proficient experience with architecting highly per formant databases using MySQL
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, and MDM.
  • Good working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Good knowledge in using apache Nifi to automate the data movement between different Hadoop systems.
  • Performing the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
  • Good working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Experience in object-oriented analysis and design (OOAD), used modeling language (UML) and design patterns.
  • Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
  • Experience on R and Python for statistical computing. Also experience with (Spark), Excel, and SAS


Big Data Tools: Hadoop Ecosystem MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, MS Access, RDBMS, MySQL, DB2, Hive, Microsoft Azure SQL Database

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.

Other Tools: TOAD, BTEQ, MS-Office suite (Word, Excel, Project and Outlook).


Confidential, Greensboro, NC

Sr. Big Data Engineer


  • Responsible for design and development of Big Data applications using Cloudera Hadoop.
  • Coordinated with business customers to gather business requirements
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Used SDLC (System Development Life Cycle) methodologies like RUP and Agile methodology.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Participated in JAD sessions and requirements gathering & identification of business subject areas.
  • Importing and exporting data into HDFS from MySQL and vice versa using Sqoop and manage the data coming from different sources.
  • Used Reverse Engineering approach to redefine entities, relationships and attributes in the data model.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Designed and produced logical and physical data models for the financial platform and other in-house applications running on Oracle databases.
  • Created dimensional model based on star schemas and designed them using Erwin.
  • Worked with ETL tools to migrate data from various OLTP databases to the data mart.
  • Used the Ralph Kimball Methodology for Data Warehouse and Data Mart designs
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Demonstrated QlikView data analyst to create custom reports, charts and bookmarks.
  • Created business requirement documents and integrated the requirements and underlying platform functionality.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Designed and implemented scalable Cloud Data and Analytical a solutions for various public and private cloud platforms using Azure.
  • Developed Scala scripts, UDF's using both Data frames/SQL for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in Mongo DB

Environment: Hadoop 3.0, Spark, HBase, Hive 2.3, HDFS, Scala 2.12, Sqoop 1.4, MapReduce Apache Nifi, HDFS, AWS, SQL server, Oracle 12c, EC2, Erwin 9.7.

Confidential, Westborough, MA

Data Analyst/Data Engineer


  • Gathered business requirements from the users and transformed and implemented into database schemas.
  • Performed data validation, filtering, sorting or other transformations for every data change in HBase table and load the transformed data to another data store.
  • Performed data analysis and load customer details from data warehousing to analyze, generate comprehensive reports to decision makers and other affected by the results.
  • Worked with Hive data warehouse tool - creating tables, data distribution by implementing
  • Managed internal data, including identifying risks to data integrity, disaster recovery and restoration.
  • Created Numerous SSIS packages for business Interfaces and Applications.
  • Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Involved in loading data from local file system to HDFS using HDFS Shell commands.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Preprocessed the collected data, imputed the missing value and applied the business rules for report building.
  • Created and loaded temporary staging tables for validation and to enhance performance.
  • Wrote and executed unit, system, and integration and UAT scripts in a data warehouse projects.
  • Created Technical specifications documents for the data warehouse design and data mapping from the source to target applying the business rules
  • Wrote PL/SQL procedures and functions to validate the loaded data through ETL
  • Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre- aggregations.
  • Define data needs, evaluate data quality, and extract/transform data for analytic projects and research.
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
  • Automated Data Extraction, Transformation and Loading the results to final datasets using Oracle, SAS and SQL.
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps and Gantt charts.
  • Developed various QlikView Data Models by extracting and using the data from various sources like QVD files, Teradata, Excel, and Flat Files.

Environment: Erwin 9.5, T-SQL, 3NF, HDFS, HBase, Hadoop 3.0, MS Visio, PL/SQL, OLAP, OLTP, MySQL.

Confidential, Reston, VA

Data Analyst/Data Modeler


  • Discovered the various data patterns, data anomalies and to understand the business by relating the Business Requirements with the source data.
  • Conducted JAD sessions, gathered information from Business Analysts, Developers, end users and stakeholders to determine the requirements and various systems.
  • Worked on Normalization and De-normalization concepts and design methodologies like Ralph Kimball and Bill Inmon's Data Warehouse methodology.
  • Used Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Created and Maintained the various Data Models for all projects I was involved in which were Conceptual, Logical and Physical Data Models.
  • Managed the metadata for the Subject Area models for both Operational & Data Warehouse/Data Mart applications.
  • Conducted data profiling, qualities control/auditing to ensure accurate and appropriate use of data.
  • Modified data sources and wrote complex SQL's in custom SQL in Tableau to get data in the required form or layout so that it can be used for visualization purposes.
  • Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
  • Performed data cleansing by analyzing and eliminating duplicate and inaccurate data.
  • Analyzed functional and non-functional data elements for data profiling and mapping from source to target data environment.
  • Worked closely with the SSIS Developers to explain the complex Data Transformation using Logic.
  • Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement, Involved in Data Mapping.
  • Developed stored procedures and triggers, packages, functions and exceptions using PL/SQL
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries.
  • Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
  • Conducted the Data Analysis and identified the Data quality issues using Data profiling methodologies.
  • Participated in performance management and tuning for stored procedures, tables and database servers.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from the source and SQL Server database systems.

Environment: E/R Diagrams, MS Visio 2014, PL/SQL, Oracle, OLAP, XML, OLTP, SQL server, Transact-SQL


Data Analyst


  • Conducted sessions with the Business Analysts and Technical Analysts to gather the requirements.
  • Involved in extensive Data Analysis on the Oracle Systems querying and writing in SQL and TOAD
  • Interacted with SSRS reporting team to gather reporting requirements, and review summary tables.
  • Used SQL joins, aggregate functions, analytical functions, group by, order by clauses and interacted with DBA and developers for query optimization and tuning.
  • Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Utilized a diverse array of technologies and tools as needed, to deliver insights such as Python, SAS, Tableau and more.
  • Developed complex PL/SQL procedures and packages using views and SQL joins.
  • Optimized the data environment in order to efficiently access data Marts and implemented efficient data extraction routines for the delivery of data.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Extracted data from different sources performing Data Integrity and quality checks.
  • Developed documents and dashboards of predictions in Micro strategy and present it to the business intelligence team.
  • Used MS Excel, Word, Access, and Power Point to process data, create reports, analyze metrics, implement verification procedures, and fulfill client requests for information.
  • Designed and developed Ad-hoc reports as per business analyst, operation analyst, and project manager data requests.

Environment: DB2, PL/SQL, T-SQL, SAS, SQL Server, MS Power Point, MS Access, MySQL, Crystal Reports

Hire Now