We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Ohio, IL


  • Over 7 years of experience with 5+ years of experience in Hadoop/Spark Developer, Data Engineer, and Programmer Analyst in design, development, and deploying large - scale distributed systems.
  • Hands-on experience in installing, configuring, and using Hadoop components like Hadoop MapReduce, HDFS, Yarn, Pig, Hive, HBase, Spark, Kafka, Flume, Sqoop, Oozie, Avro,Spark integration with Cassandra, Solr, and Zookeeper.
  • Proficient in Data Modelling / DW/ Bigdata/ Hadoop/ Data Integration/ Master Data Management, Data Migration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using custom, open source and off the shelf BI tools.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Good understanding and hands-on experience with AWS S3 and EC2.
  • Experience with event-driven and scheduled AWS Custom Lambda (Python) functions to trigger various AWS resources.
  • Experience in creating Pyspark scripts and Spark Scala jars using IntelliJ IDE and executing them.
  • Good experience on programming languages Python, Scala.
  • Experience in troubleshooting Spark/Map Reduce jobs.
  • Experience in developing Python ETL jobs run on AWS services and integrating with enterprise systems like Enterprise logging and alerting, enterprise configuration management and enterprise build and versioning infrastructure.
  • Experience in using Terraform for building AWS infrastructure services like EC2, Lambda and S3.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL, and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Expertise in configuring the monitoring and alerting tools according to the requirement like AWS CloudWatch.
  • Experience in designing both time driven and data driven automated workflows using Oozie and developing high-performance batch processing applications on Apache Hive, Spark, Impala, Sqoop HDFS.
  • Experienced in using Integrated Development environments like Eclipse, NetBeans, Kate and Gedit. Migration from different databases (i.e., Oracle, DB2, MYSQL, MongoDB) to Hadoop interactive Dashboards and Creative Visualizations using Visualization tools like Tableau, Power BI.
  • Working experience on NoSQL databases like HBase, Azure, SSIS, MongoDB and Cassandra with functionality and implementation.
  • Proficient in programming with SQL, PL/SQL, and Stored procedures.
  • Experience in Database Design, Database Management and Data Migration using Oracle, MS SQL, SQL, and Technical Support.
  • Experience in Developing ETL workflows using Informatica PowerCenter 9.X/8.X and IDQ. Worked extensively with Informatica client tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor.
  • Experience in working with business intelligence and data warehouse software, including SSAS, Pentaho, Cognos Database, Amazon Redshift, or Azure Data Warehouse.
  • Worked on Informatica Power Centre Tools-Designer, Repository Manager, Workflow Manager.
  • Expertise in broad range of technologies, including business process tools such as Microsoft Project, MS Excel, MS Access, MS Visio.
  • Using Excel pivot tables to manipulate large amounts of data to perform data analysis, position involved extensive routine operational reporting, hoc reporting, and data manipulation to produce routine metrics and dashboards for management.
  • Developed the Talend mappings using various transformations, Sessions and Workflows. Teradata was the target database, Source database is a combination of Flat files, Oracle tables, Excel files and Teradata database.


Big Data: HDFS, MapReduce, Hive, Pig, Kafka, Sqoop, Flume, Oozie, and Zookeeper, Nifi, YARN, Scala, Impala, Spark SQL, Flume.

No SQL Databases: Hbase,Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, Confidential -UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Version Control: GIT

Cloud: AWS, Azure, GCP


Confidential, Ohio, IL

Big Data Engineer


  • Designed, developed, and maintained data integration programs in Hadoop and RDBMS environment with both RDBMS and NoSQL data stores for data access and analysis.
  • Used all major ETL transformations to load the tables through Informatica mappings.
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Implemented Spark GraphX application to analyse guest behaviour for data science segments.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Worked on batch processing of data sources using Apache Spark, Elastic search.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Worked on migrating PIG scripts and Map Reduce programs to Spark Data frames API and Spark SQL to improve performance.
  • Developed the Talend mappings using various transformations, Sessions and Workflows. Teradata was the target database, Source database is a combination of Flat files, Oracle tables, Excel files and Teradata database.
  • Created Hive External tables to stage data and then move the data from Staging to main tables.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Worked with NoSQL database HBase in getting real time data analytics.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design, and review.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as MapReduce Hive, Pig, and Sqoop.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Loading data from different source (database & files) into Hive using Talend tool.
  • Conducted POC's for ingesting data using Flume.
  • Objective of this project is to build a data lake as a cloud-based solution in AWS using Apache Spark.
  • Prepared the complete data mapping for all the migrated jobs using SSIS.
  • Created SSIS Packages using SSIS Designer for export heterogeneous data from OLE DB Source (Oracle)
  • Created SSIS packages for File Transfer from one location to the other using FTP task.
  • Worked on Data modelling, Advanced SQL with Columnar Databases using AWS.
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Performed data extraction, transformation, and mapped data to new source by using SSIS and applied techniques like data profiling, staging for data migration and loaded data into data warehouse in SQL Server.
  • Worked with Excel Pivot tables.
  • Involved in Extract, Transform and Load (ETL) data from spreadsheets, flat files, database tables and other sources using SQL Server Integration Services (SSIS) and SQL Server Reporting Service (SSRS) for managers and executives.

Environment: Hadoop, Cloudera, SSIS, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Informatica, AWS, Windows & Microsoft Office, MS-Visio, MS-Excel.

Confidential, New York, NY

Big Data Developer


  • Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log, and sends them to a Kafka and Zookeeper based log collection platform.
  • Used Talend for Big data Integration using Spark and Hadoop.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Optimized Hive queries to extract the customer information from HDFS.
  • Used Polybase for ETL/ELT process with Azure Data Warehouse to keep data in Blob Storage with almost no limitation on data volume.
  • Analysed large and critical datasets using HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Loaded and transformed large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in HIVE and used partitions, buckets for performance improvements.
  • Developing Spark scripts, UDF's using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
  • Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive and Informatica.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
  • Using Hive to analyse data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Created ETL/Talend jobs both design and code to process data to target databases.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Responsible for the Extraction, Transformation and Loading of data from Multiple Sources to Data Warehouse using SSIS.
  • Experienced in loading the real-time data to NoSQL database like Cassandra.
  • Performed Logging and Deployment of various packages in SSIS.
  • Implemented various SSIS packages having different tasks and transformations and scheduled SSIS packages.
  • Developing scripts in Pig for transforming data and extensively used event joins, filtered, and done pre- aggregations.
  • Performed Data scrubbing and processing with Apache Nifi and for workflow automation and coordination.
  • Used Sqoop to import data into HDFS and Hive from Oracle database.
  • Involved in various phases of development analysed and developed the system going through Agile Scrum methodology.
  • Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
  • Analysed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
  • Built Azure Data Warehouse Table Data sets for Power BI Reports.
  • Import data from sources like HDFS/HBase into Spark RDD.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Working on BI reporting with At Scale OLAP for Big Data.
  • Implemented Kafka for streaming data and filtered, processed the data.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.

Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, Python, Hadoop, Azure, Dynamo DB, Kibana, NOSQL, Sqoop, SSIS, MYSQL.


Hadoop developer


  • Involved in designing data warehouses and data lakes on regular (Oracle, SQL Server) high performance on big data (Hadoop - Hive and HBase) databases. Data modelling, Design, implement, and deploy high-performance, custom applications at scale on Hadoop /Spark.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database systems.
  • Participated in supporting Data Governance, and Risk Compliance platform utilizing Mark Logic.
  • Participated in the Data Governance working group sessions to create Data Governance Policies.
  • Loaded data into MDM landing table for MDM base loads and Match and Merge.
  • Designed ETL process using Talend Tool to load from Sources to Targets through data Transformations.
  • Translated business requirements into working logical and physical data models for OLTP &OLAP systems.
  • Data Modeler/Analyst in Data Architecture Team and responsible for Conceptual, Logical and Physical model for Supply Chain Project.
  • Created and maintained Logical &Physical Data Models for the project. Included documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Owned and managed all changes to the data models. Created data models, solution designs and data architecture documentation for complex information systems.
  • Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
  • Worked with Finance, Risk, and Investment Accounting teams to create Data Governance glossary, Data Governance framework and Process flow diagrams.
  • Involved in Extract, Transform and Load (ETL) data from spreadsheets, flat files, database tables and other sources using SQL Server Integration Services (SSIS) and SQL Server Reporting Service (SSRS) for managers and executives.
  • Designed Star Schema Data Models for Enterprise Data Warehouse using Power Designer.
  • Experienced in Mark Logic Infrastructure sizing assessment, and Hardware evaluation.
  • Developed the Talend jobs and make sure to load the data into HIVE tables & HDFS files and develop the Talend jobs to integrate with Teradata system from HIVE tables
  • Created the best fit Physical Data Model based on discussions with DBAs and ETL developers.
  • Created conceptual, logical, and physical data models, data dictionaries, DDL and DML to deploy and load database table structures in support of system requirements.
  • Designed ER diagrams (Physical and Logical using Erwin) and mapping the data into database objects.
  • Validated and updated the appropriate Models to process mappings, screen designs, use cases, business object model, and system object model as they evolved and changed.
  • Extensive data cleansing and analysis, using pivot tables, formulas (v-lookup and others), data validation, conditional formatting, and graph and chart manipulation
  • Extensive Excel work using pivot tables and complex formulas to manipulate large data structures
  • Created Model reports including Data Dictionary, Business reports.
  • Generated SQL scripts and implemented the relevant databases with related properties from keys, constraints, indexes & sequences.

Environment: OLTP, DBAs, DDL, DML, Erwin, UML, diagrams, Snow-flake schema, SQL, Data Mapping, Metadata, OLTP, SAS, Informatica 9.5, MS-Office

Hire Now