We provide IT Staff Augmentation Services!

Sr. Hadoop, Bigdata Engineer Resume

New York, NY


  • Result - driven IT Professional with 9+ years of experience in Data Analysis, Data modeling and Big Data professional with applied information Technology.
  • Thorough understanding on End to End Software Development Life Cycle process in Agile Environment using SCRUM methodologies.
  • Excellent understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
  • Strong knowledge in Developing Big data solutions using Data ingestion, Data Storage
  • Good Knowledge with cloud technologies like Azure and AWS (EMR, S3, RedShift, EC2, DynamoDB).
  • Proficient in Technical consulting and end-to-end delivery with data analysis, data modeling, data governance and design - development - implementation of solutions.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Proficient experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
  • Effective in addressing complex POCs according to business requirements from the technical end.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, and MDM.
  • Good experience on different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Good knowledge in using apache Nifi to automate the data movement between different Hadoop systems.
  • Good knowledge on Normalization and De-normalization concepts and design methodologies like Ralph Kimball and Bill Inman’s Data Warehouse methodology.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, and XML files.
  • Excellent experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
  • Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
  • Strong background in mathematics and have very good analytical and problem solving skills.


Big Data Ecosystem: Map Reduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Management: Amazon Web Services (AWS), Amazon RedShift

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Programming Languages: SQL, PL/SQL, UNIX Shell Scripting

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2

Testing & Defect Tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source Safe

Operating System: Windows 7/8/10, UNIX, Sun Solaris

ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, TalenD, Tableau, and Pentaho

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6


Confidential - New York, NY

Sr. Hadoop, BigData Engineer

Roles & Responsibilities:

  • Participated in Sprint review/retro meetings, daily SCRUM meetings and give the daily status report.
  • Involved in all the phases of the Software Development Life Cycle (SDLC) methodologies such as Waterfall, Agile-SCRUM.
  • Responsible for developing, troubleshooting and implementing programs.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Developed in scheduling Oozie workflow engine to run multiple Hives and pig jobs.
  • Developed Oozie workflow jobs to execute hive, Sqoop and Map Reduce actions.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Involved in converting MapReduce programs into Spark transformations using Spark python API.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
  • Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
  • Managed data from various file system to HDFS using UNIX command line utilities.
  • Implemented monitoring and established best practices around usage of Elastic search
  • Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Worked on Apache Nifi as ETL tool for batch processing and real time processing.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.

Environment: Hadoop 3.0, HDFS, Apache Hive2.3, MapReduce, Apache Nifi1.6, Apache Pig 0.17, MongoDB, Sqoop1.4, SQL, Oracle 12c, PL/SQL, Agile, Azure, Yarn, Oozie 4.3, ETL

Confidential - Greensboro, NC

Sr. BigData Engineer

Roles & Responsibilities:

  • Provided technical expertise and aptitude to BigData technologies as they relate to the development of analytics.
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Participated in requirements sessions to gather requirements along with business analysts and product owners.
  • Involved in Agile development methodology active member in scrum meetings.
  • Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Installed and configured Hadoop Ecosystem components.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Worked Platform using Hive, Sqoop, HBASE .This effort showcased the benefits of the Horton works
  • Prepared process flow/activity diagram for existing system using MS Visio and re- engineer the design based on business requirements.
  • Utilized Hadoop, Hive and SQL technologies and moved data sets to production to be utilized by business teams to make business decisions.
  • Designed and Developed Real time Stream processing Application using Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Worked closely with business analyst for requirement gathering and translating into technical documentation.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
  • Wrote SQL scripts to run ad-hoc queries, Stored Procedures & Triggers and prepare reports to the management.
  • Designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
  • Involved in reports development using reporting tools like Tableau.

Environment: PL/SQL, Agile, AWS, SQL, HDFS, SSAS, SSRS, Sqoop 1.2, Apache Pig 0.16, Kafka, Scala 1.4, MapReduce, OLAP, OLTP, HBase, Amazon Redshift, MS Visio

Confidential - McLean, VA

Data Analyst/Data Engineer

Roles & Responsibilities:

  • Worked with the analysis teams and management teams and supported them based on their requirements.
  • Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
  • Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
  • Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
  • Extensively experienced with Agile and SCRUM programming methodology
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS.
  • Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
  • Performed data analysis and data profiling using on various sources systems including Oracle, SQL Server and DB2.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
  • Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
  • Created data models for AWS Red shift, Hive and HBase from dimensional data models.
  • Created Data Validation rules using SQL to validate the structure and integrity of the extracted data.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Worked with reversed engineer Data Model from Database instance and Scripts.
  • Worked on data mapping and data mediation between the source data table and target data tables using MS Access and MS Excel.
  • Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.

Environment: HDFS, Sqoop, Apache Pig, AWS, Amazon Red shift, Apache Hive, Map Reduce, ODS, OLTP, HBase, SQL Server

Confidential - Dallas, TX

Data Analyst/Data Modeler

Roles & Responsibilities:

  • Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
  • Designed the ER diagrams, logical model and physical database and for Oracle and Teradata as per business requirements using Erwin
  • Designed 3rd normal form target data model and mapped to logical model.
  • Developed Data mapping Transformation and Cleansing rules for the Master Data Management involving OLTP and OLAP.
  • Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
  • Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
  • Used SQL, PL/SQL to validate the Data going in to the Data warehouse
  • Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
  • Designed and developed the data dictionary and Metadata of the models and maintain them.
  • Involved in extensive DATA validation using SQL queries and back-end testing
  • Tested the database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
  • Designed and developed cubes using SQL Server Analysis Services(SSAS) using Microsoft Visual
  • Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Analyzed and presented the gathered information in graphical format for the ease of business managers.
  • Developed and maintained the data dictionaries, Naming Conventions, Standards, and Class words Standards Document.
  • Performed GAP analysis to analyze the difference between the system capabilities and business requirements.
  • Produced Source to target data mapping by developing the mapping spreadsheets.
  • Created documentation and test cases, worked with users for new module enhancements and testing.

Environment: Erwin9.5, Agile, Oracle11g, ETL, SSIS, Teradata, SSAS, PL/SQL, OLTP, OLAP, DBA, AWS


Data Analyst

Roles & Responsibilities:

  • Worked with Data Analysts to understand Business logic and User Requirements.
  • Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
  • Created reports for the Data Analysis using SQL Server Reporting Services.
  • Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
  • Created SQL queries to simplify migration progress reports and analyses.
  • Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
  • Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
  • Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
  • Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
  • Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
  • Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
  • Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
  • Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
  • Using V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
  • Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
  • Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
  • Extracted data from different sources performing Data Integrity and quality checks.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.

Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint

Hire Now