We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Chicago, IL


  • Extensive IT experience of over 9+ years with multinational clients which includes 4 years of Big Data related architecture experience developing Spark/Hadoop applications.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster.
  • Experienced in working with Spark ecosystem using Spark - SQL and Scala queries on different data file formats like .txt, .csv etc.
  • Designing and creating Hive external tables using shared meta-store instead of the derby with partitioning, dynamic partitioning and buckets
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in integrating Hive and HBase for effective operations.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop .
  • Strong understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB, Redis, Neo4j.
  • Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Proficient with Cluster management and configuring Cassandra Database .
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
  • Have good experience in creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
  • Working knowledge on major Hadoop ecosystems PIG , HIVE , Sqoop, and Flume.
  • Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
  • Knowledge on AWS (Amazon EC2) Hadoop distribution.
  • Developed high-throughput streaming apps reading from Kafka queues and writing enriched data back to outbound Kafka queues.
  • Wrote and worked on complex performance improvements on PL/SQL queries, stored procedures , triggers , indexes with databases like MySQL and Oracle.
  • Also, working towards improvement of knowledge on No-SQL databases like MongoDB.
  • Experience on NoSQL databases including HBase, Cassandra.
  • Hands-on experience in scripting skills in Python , Linux and UNIX Shell.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Knowledge on creating Solr collection configuration to scale up the infrastructure.
  • Experience in developing web-based applications using Python.
  • Experience in application development using Java, J2EE, EJB, Hibernate, JDBC, Jakarta Struts, JSP and Servlets.
  • Experience in using various IDEs Eclipse, My Eclipse and repositories SVN and CVS.
  • Experience of using build tools Ant and Maven.
  • Working with relative ease with different working strategies like Agile, Waterfall and Scrum methodologies.
  • Excellent communication and analytical skills and flexible to adapt to evolving technology.
  • Impeccable Communication and analytical skills.


Languages: C, Python, Java, SQL, Scala, UML, XML

Hadoop Ecosystem: MapReduce, Spark, Hive, Pig, Sqoop, Flume.

Databases: Oracle 10g/11g, SQL Server, MYSQL

No SQL: HBase, Cassandra, MongoDB

Application / Web Servers: Apache Tomcat, JBoss, Mongrel, Web Logic, Web Sphere

Web Services: SOAP, REST

Operating systems: Windows, Unix/Linux

Microsoft Products: MS office, MS Visio, MS Project

Frameworks: Spring, Hibernate, Struts


Confidential, Chicago, IL

Sr. Hadoop/Spark Developer

Roles &Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for managing and scheduling Jobs on a Hadoop cluster.
  • Loading data from UNIX file system to HDFS and vice versa.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with Apache Spark for large data processing integrated with functional programming language Scala.
  • Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Configured Spark Streaming to receive real time data and store the stream data to HDFS.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS and SOLR.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Real time streaming the data using Spark with Kafka.
  • Responsible for creating Hive tables and working on them using Hive QL.
  • Implementing various Hive UDF’s as per business requirements.
  • Exported the analyzed data to the databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in Data Visualization using Tableau for Reporting from Hive Tables.
  • Developed Python Mapper and Reducer scripts and implemented them using Hadoop Streaming.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Customized Apache Solr to handle fallback searching and provide custom functions.
  • Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, HBase, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins.

Confidential - Philadelphia, PA

BigData Developer

Roles & Responsibilities:

  • Involved in Automation of click stream data collection and store into HDFS using Flume.
  • Involved in creating Data Lake by extracting customer's data from various data sources into HDFS.
  • Used Sqoop to load data from Oracle Database into Hive.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from multiple data sources.
  • Implemented various Pig UDF’s for converting unstructured data into structured data.
  • Developed Pig Latin scripts for data processing.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Developed theApache Spark, Flume, and HDFS integration project to do a real-time data analysis
  • Developed data pipeline using Flume, Spark and Hive to ingest, transform and analyzing data
  • Wrote Flume configuration files for importing streaming log data into MongoDB with Flume
  • Performed masking on customer sensitive data using Flume interceptors.
  • Used IMPALA to analyze data ingested into Hive tables and compute various metrics for reporting on the dashboard.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Making code changes for a module in turbine simulation for processing across the cluster using spark-submit.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
  • Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
  • Used Hive to analyze the data in HDFS to identify issues and behavioral patterns.
  • Involved in production Hadoop cluster set up, administration, maintenance, monitoring and support.
  • Logical implementation and interaction with HBase .
  • Assisted in creation of large HBase tables using large set of data from various portfolios.
  • Cluster coordination services through Zookeeper .
  • Efficiently put and fetched data to/from HBase by writing MapReduce job.
  • Developed MapReduce jobs to automate transfer of data from/to HBase .
  • Assisted with the addition of Hadoop processing to the IT infrastructure.
  • Used flume to collect the entire web log from the online ad-servers and push into HDFS .
  • Implemented custom business logic by writing UDF’s in Java and used various UDF’s from Piggybank and other sources.
  • Implemented MapReduce job and execute the MapReduce job to process the log data from the ad-servers.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Performing analysis using high level languages like Python.
  • Launching Amazon EC2 cloud instances using Amazon images and configuring launched instances with respect to specific applications.
  • Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs to build and letother groups build dashboards.

Environment: Hadoop, Pig, Sqoop, Oozie, MapReduce, HDFS, Hive,Java, Python, Eclipse, HBase, Flume, AWS, Oracle 10g, UNIX Shell Scripting, GitHub, Maven.

Confidential- Austin, TX

Hadoop Developer/Administrator

Roles & Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
  • Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
  • Build/Tune/Maintain Hive QL and Pig Scripts for user reporting.
  • Developed MapReduce Programs.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using SQOOP and automated the SQOOP jobs by scheduling in Oozie.
  • Create Hive scripts to load data from one stage into another and implemented incremental load with the changed data architecture.
  • The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Performed data analysis, queries on hive, pig on AMBARI(Hortonworks).
  • Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
  • Implemented Hive partitioning and bucketing to improve query performance in the Staging layer which is de-normalized form of the Analytics Model.
  • Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
  • Managing and reviewing Hadoop log files.
  • Supported MapReduce Programs running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive.
  • Involved in creating Hive tables, loading data, and writing Hive queries
  • Develop Shell scripts to automate routine DBA tasks (i.e. database refresh, backups, monitoring)
  • Tuned/Modified SQL for batch and online processes
  • Wrote MapReduce programs.
  • Defining workflow using Oozie framework for automation.
  • Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
  • Responsible for reviewing Hadoop log files.
  • Loading and transforming large sets of unstructured and semi structured data.
  • Performed data completeness, correctness, data transformation and data quality testing using SQL.
  • Written shell scripts to retrieve information from files.
  • Implementation of Hive partition (static and dynamic) and bucketing.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
  • Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
  • Written the Apache PIG scripts to process the HDFS data.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Involved in installing Hadoop Ecosystem components.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Installed and configured Pig, Hive and Sqoop.

Environment: Core Java 5, JSP, Struts, HTML, CSS, XML, JavaScript, Oracle 10g, PL/SQL, Database Objects like Stored Procedures, Packages. Rational Application Developer7, Windows 7, WebSphere Application Server 7, Oracle SQL Developer, Maven, TOAD, Putty.

Confidential, Tampa, FL

Java/Hadoop Developer


  • Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API .
  • Designed and implemented Java engine and API to perform direct calls from front-end JavaScript (ExtJS) to server-side Java methods (ExtDirect).
  • Used Spring AOP to implement Distributed declarative transaction throughout the application.
  • Designed and developed Java batch programs in Spring Batch.
  • Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
  • Developed Map-Reduce programs to get rid of irregularities and aggregate the data.
  • Implemented Hive UDF's and did performance tuning for better results
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
  • Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE.
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
  • Used various compression codecs to effectively compress the data in HDFS.
  • Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF's involving date functions.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Installed and configured Pig and wrote Pig Latin scripts .
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
  • Done the work in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Involved in loading data from UNIX file system to HDFS.
  • Created java operators to process data using DAG streams and load data to HDFS.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Involved in Develop monitoring and performance metrics for Hadoop clusters.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.


Java/J2EE Developer

Roles &Responsibilities:

  • Involved in writing programs for XA transaction management on multiple databases of the application.
  • Developed java programs, JSP pages and servlets using Cantata Struts framework.
  • Involved in creating database tables, writing complex TSQL queries and stored procedures in the SQL server.
  • Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
  • Used EJB s in the application and developed Session beans to implement business logic at the middle tier level.
  • Actively involved in writing SQL using SQL Query Builder.
  • Involved in coordinating the on-shore/Off-shore development and mentoring the new team members.
  • Extensively Used Ant tool to build and configure J2EE applications and used Log4J for logging in the application
  • Used JAXB to read and manipulate the xml properties.
  • Used JNI for calling the libraries and other implemented functions in C language.
  • Used prototype MooTools and script.aculo.us for fluid User Interface.
  • Involved in fixing defects and unit testing with test cases using JUnit.

Environment: Java, EJB, Servlets, XSLT, CVS, J2EE, AJAX, Struts, Hibernate, ANT, Tomcat, JMS, UML, Log4J, Oracle 10g, Eclipse, Solaris, JUnit and Windows 7/XP, Maven.


Java Developer

Roles &Responsibilities:

  • Played an active role in the team by interacting with business and program specialists and converted business requirements into system requirements.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Implemented Services using Core Java.
  • Involved in development of classes using java.
  • Good proficiency in developing algorithms for serial interfaces.
  • Involved in testing of CAN protocols.
  • Developed the flow of algorithm in UML.
  • Used Servlets to implement Business components.
  • Designed and Developed required Manager Classes for database operations
  • Developed various Servlets for monitoring the application.
  • Designed and developed the front end using HTML and JSP
  • Developed XML files, DTDs, Schema's and parsing XML by using both SAX and DOM parser.
  • Wrote deployment descriptors using XML and Test java classes for a direct testing of the Session and Entity beans.
  • Did Packaging and Deployment of builds through ANT script.
  • Wrote stored procedure and used JAVA APIs to call these procedures.
  • Database designing that includes defining tables, views, constraints, triggers, sequences, index, and stored procedures.
  • Developed verification and validation scripts in java.
  • Followed verification and validation cycle for development of algorithms.
  • Developed Test cases for Unit Test cases and as well as System and User test scenarios.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment : Java, JSP, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE, Windows 7/XP/Vista, UNIX, LINUX.

Hire Now