We provide IT Staff Augmentation Services!

Big Data Consultant Resume

Weehawken, NJ


  • Over 6 years professional IT experience which includes 3 years in Hadoop ecosystems and continuous working experience in Java
  • Cloudera Certificated Spark & Hadoop developer (CCA175)
  • Technical experience in manufacturing, finance and internet industry
  • Proficient in Java, Python, Scala and R
  • Working knowledge in NoSQL storage, such as Hive, Hbase, Cassandra, Redis, Cassandra, MongoDB, Impala
  • Exposed to setting up and maintaining Hadoop cluster on YARN.
  • Expert in importing and exporting data using Sqoop from HDFS to Relational Database Systems (Oracle/MySQL) and vice - versa
  • Involved in moving log files generated from various sources to HDFS for further processing through Flume
  • Experienced in building real time high throughput streaming service to transport high volume data using Kafka
  • Strong expertise in building traditional ETL pipelines using Informatica best practices
  • Capable of writing custom UDFs in Java for HIVE and Pig Latin to extend functionality
  • Experienced in job, workflow scheduling and monitoring tools like Oozie, Appworx
  • Developed data analysis and visualization using SQL, R, HiveQL, Spark SQL and Tableau.
  • Experienced in working with Apache Spark streaming API for near real time data processing
  • Skillful in Data Validation, Cleansing, Verification and identifying data mismatch
  • Experienced in writing custom MapReduce programs in Java
  • Familiar with Machine Learning and Statistical Analysis using R, Python and Spark
  • Knowledge in Machine Learning Framework including Scikit-learn, NLTK and MLlib
  • Algorithms including K-Means, KNN, Regression, SVM and Neural Network
  • Extensive experienced in writing complex SQL queries using Oracle Analytic Functions
  • Strong Database Experience in PL/SQL database programming to create Packages, Stored Procedures, Functions, Triggers, Index, Materialized Views and Cursors
  • Clear understanding of theories in ER modeling for OLTP and dimension modeling for OLAP
  • Strong in core java, data structure, algorithms design, Object-Oriented Design(OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading
  • Hands on experience in MVC architecture and Java EE frameworks like Struts, Spring MVC, and Hibernate.
  • Experienced in interacting with Clients, Business Analysts, IT leads, UAT Users and developers
  • Exposed to Agile Development environment, tools and methodologies
  • Authorized to work in the US for any employee


Hadoop Ecosystem\ Web Development: Hadoop2.0, Spark2.0, MapReduce, Pig0.15+ Hibernate, HTML, CSS, AJAX, Bootstrap, J2EE Hive, Sqoop, Flume, Kafka1.0+, Zookeeper3.0+\ Spring MVC, Node.js, Django Hbase, Oozie, Storm1.0

Programming Language\ Data Analysis & Visualization\: Java, Scala, Python, JavaScript, PL/SQL\ Python, R, SQL, Tableau, D3.js

Cloud Platform\ Scripting Language\: Amazon Web Service, Heroku\ UNIX Shell, HTML, XML, CSS, JSP

Operating Systems\ Environment\: Mac OS, Ubuntu, CentOS, Windows\ Agile, Scrum, TDD, JIRA, Confluence, Jenkins

Machine Learning Algorithm\ Database\: Linear Regression, Logistic Regression, MySQL, Oracle 11g, Exadata, PostgreSQL9.x, Decision, Tree, Neural Network, K Means, \ SQL Server 2012, 2016, MongoDB 3.2, HBase KNN, Support, Vector Machine\ 0.98, Cassandra3.0, Redis3.2

Others\ Machine Learning Framework: Docker, Informatica9.0, SSIS \ Spark MLlib, SciPy, Matplotlib, Pandas, Numpy


Confidential, Weehawken, NJ

Big Data Consultant


  • Integrated data from relational database (Oracle, MySQL, SQL Server) to HDFS using Sqoop
  • Configured flume agents to collect real time logging data from application servers
  • Implemented reliable and scalable Kafka message system for high throughput data ingestion
  • Wrote MapReduce programs in Java for offline batch processing
  • Created multiple Hive tables with partitioning and bucketing for efficient data access
  • Processed stream data using Spark Streaming for risk evaluation and product recommendation
  • Cached key result from streaming and batch processing system in Redis for fast access
  • Involved in Analyzing time series data using Spark MLlib, Scipy, Matplotlib
  • Evaluated model accuracy and tune parameters with offline simulation data
  • Automated workflows using Oozie and shell Scripts

Environment: Hadoop 2.6, Cloudera CDH 5X, Kafka, Sqoop, Flume, Zookeeper, Spark 2.0, Scala, Redis, Oozie, Shell script, Oracle, MySQL, SQL Server, Python, Java

Confidential, Erie, PA

Senior Data Warehouse Consultant


  • Integrated structured data from various portfolios of sources to Hive using Sqoop
  • Ingested large amount of semi-structured Application data using Flume in real time
  • Stored data in Kafka cluster as a central buffer
  • Developed MapReduce jobs for data cleaning, validation and categorization
  • Built operational data store in Hive for raw data
  • Created Fact/Dim/Bridge tables with Star Schema by Kimball Approach
  • Developed periodic analytic/ aggregation queries and saved results in Hbase for fast access
  • Wrote Hive UDFs for data transformation and aggregation
  • Used Informatica, SSIS to integrate Oracle E-business suite with Exadata and SQL Server
  • Involved in building business intelligence dashboards using tools like Tablea, OBIEE
  • Drafted shell scripts for job execution and scheduling using Appworx and Oozie
  • Supported production Data Lake in terms of data, accuracy, consistency and performance
  • Worked by Agile/SCRUM methodologies

Environment: Hadoop 2.4, Hive 0.12, Sqoop, Flume, Kafka, Hbase, Oozie, Appworx, Tableau, OBIEE, MySQL, SQL Server2012, 2016, Informatica9.0, Java, SQL, PL/SQL, Shell script, Exadata, Agile


Hadoop developer


  • Cleaned data using Map Reduce programs in Java for data cleaning and categorization
  • Wrote shell script to manipulate files on application servers
  • Used Flume to collect, aggregate and store log data from different sources
  • Built Informatica workflows to capture change in application database
  • Captured streaming data by Kafka and do real time analysis using Storm
  • Stored analysis from streaming data in Hbase for responsive ad hoc query
  • Created thousands Fact/Dim tables in Hive with partitioning and bucketing for efficient access
  • Wrote HiveQL scripts for data analysis and exploration
  • Extracted data using Sqoop from HDFS to MySQL for business intelligence team
  • Worked with analytics team to prepare and visualize results in Tableau for reporting
  • Used Oozie to orchestrate the MapReduce jobs in order to setup automated workflow

Environment: Hadoop, Java, HDFS, Flume, Hive, MapReduce, Sqoop, HQL, Eclipse, MySQL, Tableau, D3, Hbase, Kafka, Spark


Hadoop developer


  • Used Sqoop to import data from Oracle and MySQL to Hive
  • Wrote HiveQL queries to retrieve and analyze the Hive storage
  • Used Flume to stream the log data and social media JSON Format data from sources.
  • Developed MapReduce programs, Hive SerDes to clean and parse data in HDFS obtained from various data sources.
  • Used Oozie to orchestrate the MapReduce jobs in order to setup automated workflow
  • Collected high throughput data using Kafka and analyzed by Spark
  • Applied Logistic Regression algorithm to build the model
  • Visualized the data with Python Matplotlib, Tableau and D3.js

Environment: Hadoop 2.2, Spark, MapReduce, MySQL, Oracle SQL, Hive, Sqoop, Tableau, Matplotlib, D3.js


Oracle E-Business-Suite Developer


  • Develop various Oracle Applications using PL/SQL , SQL*Plus , Forms, Reports, Workflow Builder, and Application Object Library
  • Use Oracle E-Business Suite applications for accounts payable/accounts receivable, general ledger and cash management supporting data extraction, integration, filtering, and validation
  • Data integration from legacy system into Oracle E-Business Suite 11i
  • Maintain and build workflow extension to support new business process
  • Build customized data quality checking jobs to maintain data quality between data warehouse and source system in ETL process using Informatica
  • Improve production performance through determining bottlenecks like implementing database partitioning and increasing block size, data cache size, sequence buffer length, target based commit interval and SQL overrides

Environment: Linux, Java, Shell script, SQL, PL/SQL, Oracle form/report builder, Oracle E-Business Suite 11i, Informatica


Backend Developer


  • Developed the data parsing system on XML/JSON
  • Involved in system design, which is based on Spring Struts Hibernate framework.
  • Worked in Spring Hibernate Template to access the MySQL database.
  • Involved in Unit testing of the components and created unit test cases and did unit test review
  • Developed online data analysis system by using IBM Cognos
  • Developed and fine-tuned database(MySQL)
  • Optimize current data models and design diagrams
  • Ensured data integrity and detected data errors and misuse

Environment: J2SE/J2EE 5.0, JSP, HTML, JavaScript, JDBC, Eclipse, IBM Cognos, IBM DataStage, MySQL, MySQL Workbench, Toad, Linux, shell script

Hire Now