We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

New York, NY


  • Above 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, BeanStalk, ECS, Cloudwatch, Lambda, ELB, VPC, ElasticCache, DynamoDB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expert level in Server side development using J2EE , EJB, J2SE, Spring, Servlets, J2SE, Python, C++ on Windows, Unix and Linux Platform.
  • Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ and MDM.
  • Experience in cloud development architecture on Amazon AWS, EC2, Redshift and Basic on Azure.
  • Experience in Work on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Excellent Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Working Dimensional Data modeling, Star Schema/Snow flake schema, Fact & Dimensions Tables.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Thorough Knowledge in creating DDL, DML and Transaction queries in SQL for Oracle and Teradata databases.
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
  • Experience in building reports using SQL Server Reporting Services and Crystal Reports.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
  • Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.


Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Management: Amazon Web Services(AWS), Amazon Redshift

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, ANSI SQL, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.


Sr. Big Data Engineer

Confidential - New York, NY


  • Designed and developed software applications, testing, and building automation tools.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts (for scheduling of few jobs).
  • Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
  • Performed querying of both managed and external tables created by Hive using Impala.
  • Designed/developed tables, views, various SQL queries, stored procedures, functions.
  • Involved in PL/SQL code review and modification for the development of new requirements.
  • Extracted data from existing data source and performed ad-hoc queries.
  • Utilized SAS and SQL extensively for collecting, validating and analyzing the raw data received from the client.
  • Executed data extraction programs/data profiling and analyzing data for accuracy and quality.
  • Analyzed the data using advanced excel functions like Pivot tables, VLOOK up, visualizations to get the descriptive analysis of the data.
  • Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
  • Used advanced Microsoft Excel to create pivot tables, and other excel functions to prepare reports and dashboard with user data.
  • Maintained numerous monthly scripts, executed on monthly basis, produces reports and submitted on time for business review.
  • Developed ad-hoc reports using Crystal reports for performance analysis by business users.
  • Implemented the Big Data solution using Hadoop, and hive to pull/load the data into the HDFS system.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Implemented and configured workflows using Oozie to automate jobs.
  • Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
  • As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, and Pig.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Pig, HBase, MongoDB, and Spark.
  • Worked experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Managed data from various file system to HDFS using UNIX command line utilities.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Managed and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera &.
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.

Environment: Hadoop 3.0, MapReduce, Hive 2.3, Pig 0.17, HDFS, HBase, MongoDB, Agile, Azure, MySQL, Oozie, MySQL, Sqoop1.4, HBase1.2.

Confidential, Merrimack, NH

Sr. Big Data Engineer


  • Worked as a Big Data Engineer to Import and export data from different databases.
  • Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain Software Development Life Cycle (SDLC).
  • Wrote script for Location Analytic project deployment on a Linux cluster/farm & AWS Cloud deployment using Python.
  • Wrote Scala/Spark/AWS EMR cloud application to process & transform billions of Rest & mobile events generated on Realtor.com every hour.
  • Created Logical & Physical Data Modeling on Relational (OLTP), Dimensional Data Modeling (OLAP) on Star schema for Fact & Dimension tables using Erwin.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Used Python, Ruby, Pig, Hive, Sqoop to implement various tools & utilities for data import & export.
  • Utilize U-SQL for data analytics/ data ingestion of raw data in Azure and Blob storage
  • Performed thorough data analysis for the purpose of overhauling the database using ANSI-SQL.
  • Translated business concepts into XML vocabularies by designing XML Schemas with UML.
  • Worked with medical claim data in the Oracle database for Inpatient/Outpatient data validation, trend and comparative analysis.
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Implemented reporting in PySpark, Zeppelin & querying through Airpal & AWS Aethna.
  • Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Netezza
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Worked on with importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop
  • Supported various reporting teams and experience with data visualization tool Tableau.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
  • Developed and designed data integration and migration solutions in Azure.
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Prepared complex ANSI-SQL queries, views and stored procedures to load data into staging area.
  • Involved in Hive-HBase integration by creating hive external tables and specifying storage as HBase format.
  • Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
  • Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on scheduling of the reports.
  • Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.

Environment: Hadoop 3.0, Kafka 1.1, SQL, PL/SQL, OLAP, ANSI-SQL, Java, Python, OLTP, SDLC, MDM, Netezza, Oracle 12c, XML, MapReduce, SSRS, MYSQL, T-SQL, Teradata 15, Azure, AWS, ETL, HDFS, Hive 2.3, Sqoop 1.4, Tableau, Pig 0.17.

Confidential - West Point, PA

Sr. Data Engineer


  • Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.
  • Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
  • Designed & Developed tool like Ambari & Chef for MPact Software, Hadoop, Hbase & SparkDeployment, Configuration, Monitoring, HA, Load & Data Balancing, Scalability on AWS, ESXI, XEN & distributed cluster using Java 8, Spring, Scala, Python & Ruby.
  • Created data models for AWS Redshift and Hive from dimensional data models.
  • Executed change management processes surrounding new releases of SAS functionality
  • Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
  • Performed Normalization of the existing (3rd NF), to speed up the DML statements execution time.
  • Participated in data collection, data cleaning, data mining, developing models and visualizations.
  • Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
  • Developed Server side and client-side web applications using Spring 2.5, Struts 2, EJB, Hibernate, IBatis, JSF, JSTL, ExtJS and Web 2.0 Ajax frameworks. Developed small intranet sites using Python.
  • Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
  • Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
  • Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
  • Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
  • Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
  • Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
  • Extensively involved in development and implementation of SSIS and SSAS applications.
  • Collaborated with ETL, and DBA teams to analyze and provide solutions to data issues and other challenges while implementing the OLAP model.
  • Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used
  • Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis.
  • Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy oracle and SQL server database systems.

Environment: Hadoop 3.0, HDFS, HBase 1.2, SSIS, SSAS, OLAP, OLTP, ETL, Java, ANSI-SQL, AWS, SDLC, T-SQL, SAS, MySQL, HDFS, Sqoop 1.4, Cassandra 3.0, MongoDB, Hive 2.3, SQL, PL/SQL, Teradata 15, Oracle 12c, MDM.

Confidential - Lowell, AR

Sr. Data Analyst/Data Engineer

Roles & Responsibilities

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Used Agile Central (Rally) to enter tasks which has the visibility to all the team and Scrum Master.
  • Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Involved in PL/SQL code review and modification for the development of new requirements.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
  • Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
  • Used SAS procedures like means, frequency and other statistical calculations for Data validation.
  • Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
  • Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
  • Integrated various sources in to the Staging area in Data warehouse to Integrating and Cleansing data.
  • Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL

Environment: Erwin 9.6, Teradata r14, Oracle 11g, SQL, T-SQL, PL/SQL, AWS, Agile, OLAP, OLTP, SSIS, HDFS, SAS, Flume, SSRS, Sqoop, Map Reduce, My SQL, HDFS.

Confidential - SFO, CA

Sr. Data Analyst/Data Modeler


  • Worked as a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, and BAs.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
  • Extensively used of data transformation tools such as SSIS, Informatica or Data Stage.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Performed data mining on data using very complex SQL queries and discovered pattern.
  • Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
  • Worked on Normalization and De-Normalization techniques for OLAP systems.
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, geographical maps and Gantt charts.
  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
  • Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
  • Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
  • Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.

Environment: E/R Studio V15, Teradata, SQL, PL/SQL, T-SQL, OLTP, SSIS, SSRS, OLAP, Tableau, OLTP, Netezza.

Hire Now