We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

Plano, TX


  • 9+ years of professional experience in IT, with experience in Hadoop Eco system, and Big - Data Analytics in Finance, healthcare and retail industries.
  • Technical expertise on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase, Sqoop, Oozie, Flume and Splunk.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Experience in managing and reviewing Hadoop Log files.
  • Hands on Apache Spark and Kafka
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in performing data processing, data cleaning and data ingestion on complex data (unstructured and semi-structured)
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera and Hortonworks.
  • Excellent knowledge in building and scheduling Big Data workflows with the help of OOZIE and Crontab
  • Good Knowledge in Amazon AWS concepts like EC2, EMR and Cloudwatch web services which provides fast and efficient processing of Big Data.
  • Hands on experience in designing and coding web applications using Core Java and J2EE technologies. Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Experience in Web Services using XML, HTML, SOAP and REST.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
  • Familiarity working with popular frameworks likes Struts, Hibernate, Spring, MVC and AJAX.
  • Extensive experience in developing components using JDBC, Java, Oracle, XML and UNIX Shell Scripting.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.


Hadoop Ecosystem: HDFS, Map Reduce Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Spark, Strom, Kafka, and Avro.

ETL Tools: Confidential, Talend, Jasper ETL Express

Web Technologies: Java, J2EE, Servlets, JSP, JDBC, XML, AJAX, SOAP, WSDL

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x, Struts 1.x/2.x

Programming Languages: Java, XML, Unix Shell scripting, HTML.

Database Systems: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access

Web Services: Web Logic, Web Sphere, Apache Tomcat

Monitoring & Reporting tools: Ganglia, Custom Shell scripts

Operating Systems: Windows-XP/2000/NT, UNIX, Linux, and DOS

IDE: Eclipse3.x, NetBeans


Confidential, Plano, TX

Sr. Big Data Developer


  • Responsible for loading customer’s data and event logs into HBase.
  • HTML data loaded into S3 repository using Apache Nifi.
  • Created HBase tables to store variable data formats of input data coming from different sources.
  • Involved in adding huge volumes of data in rows and columns to store data in HBase.
  • Created job using Apache Spark to pre-process the data.
  • Performance tuning of Apache Spark code by tuning GC instance, memory configuration parameters, spark default configuration parameters and kyro serialization.
  • Store filtered intermediate data into HDFS.
  • Developed data pipeline to store data from Amazon AWS to HDFS.
  • Implemented Kafka consumers to move data from Kafka partitions to Spark code for it to be analysed and processed.
  • Tuning of Kafka to increase consumer throughput.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Used Hive QL to find correlations between customer’s browser log data and analysed them to build risk profile for such sites.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Job scheduled using Cron.

Environment: Hadoop, Spark, HBASE, Apache Nifi, Hive, Kafka, Cron, AutoSys, GitHub, Maven.

Confidential, Sunnyvale, CA

Big Data Consultant


  • Responsible for building Hadoop cluster and integrate with Confidential Data Integration (PDI) server
  • Experienced in creating ETL transformations/Jobs using Spoon
  • Experience in developing visual MapReduce Applications using Confidential Spoon
  • Data loaded into impala tables after data cleansing
  • Established database connection from Confidential to store in MySQL
  • Sqoop through Confidential
  • Developed various complex Mapper and Reduce transformations for Confidential MapReduce Jobs
  • Experience in using MongoDB
  • Extensively involved in Hadoop testing where scripts written in Python
  • Responsible for analyzing logs generated from various test cases identifying the reasons in case of failures
  • Experience in building plugins using Confidential java API
  • Experience in developing load balancing solution using Java Spring Framework
  • Involved in various debugging sessions with team
  • Responsible test cases reporting and documenting the test results
  • Deployed Hadoop cluster on AWS EC2 instance and integrated with Confidential PDI to run Mapreduce Jobs on PDI
  • Experience Monitoring Metrics on Amazon Cloudwatch
  • Good Knowledge in Report Designing using Confidential BA suite

Environment: Confidential PDI, Spoon, Java 6, Linux, Hadoop (CDH and Hortonworks), Impala, Hive, Sqoop, Hbase, Pig, MySQL

Confidential - Philadelphia, PA

Senior Hadoop Developer - Analytics


  • Categorized driver trips based on the current position of taxi using K-Clustering approach
  • Centered the data attributes using log scale to derive the normal distribution. This helped us in using the different attributes for prediction and also handle the outliers.
  • Worked on improving the sensitivity and accuracy by cost complexity pruning and decision trees
  • Worked on analyzing the nature of data through graphs generated using R (ggplot)
  • Used confusion matrix to demonstrate the viability of test data set
  • Performed Ensemble technique to increase model accuracy
  • Translated analytical model findings to business insights and presented them to non-technical audiences.
  • Migrated Algorithms from SAS code to Hadoop MapReduce. A step taken to reduce the product maintenance cost
  • Involved with statistical domain experts to understand the data and worked with data management team on data quality assurance
  • Experience in statistical data analysis and predictive analytics using SAS and SQL
  • Extracted and summarized features from the data for each point of each trip
  • Proficiency in analytical data preparation by executing the initial explanatory data analysis using Excel (Sort, VLOOKUP, Hlookup, merge, filters, pivot tables, conditional formatting and charts
  • Data pre-processing -transformations, variable filtering, imputation of missing data, capping skewed values, binning, duplicates
  • Conducted Explanatory Data Analysis and carried out visualizations with ggplot2 () function

Environment: R, SAS Base, MapReduce, Oracle Big Data Appliance


Hadoop Developer


  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
  • Using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
  • Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
  • Worked on AWS to create EC2 instance and installed Java, Zookeeper and Kafka on those instances.
  • Involved in using SQOOP for importing and exporting data into HDFS and Hive.
  • Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Had a couple of workshops on Spark, RDD & spark-streaming.

Environment: Hadoop, MapReduce, Yarn, Hive, HBase, Oozie, Sqoop, Strom, Flume, AWS, Oracle 11g, Core Java Cloudera HDFS, Eclipse.


JAVA/J2EE Consultant


  • End to End designing of Critical Core Java Components using Java Collections and Multithreading.
  • Analysis of different database schemas Transaction and Data warehouse to build extensive reports to Business using SQL & Joins.
  • Development of multiple reports to business in quick turn-around time, which helped business to save considerable operational costs.
  • Created one of the best programs to notify the operational team on Downtime of one of 250 pharmacies on AP network in a few seconds.
  • Created an interface using JSP, Servlet and MVC Struts architecture for pharmacy team to resolve stuck orders in different pharmacies.
  • Performance tuned the IMS report to memory leaks and best practices in java to boost the performance and reliability of the application.

Environment: Java 1.4, J2EE (JSP, Servlets, Java Beans, JDBC, Multi-Threading), LINUX (Shell & Perl Scripting), and SQL.

Hire Now