We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Chicago, IL


  • Extensive IT experience of over 9+ years wif multinational clients which includes 4 years of Big Data related architecture experience developing Spark/Hadoop applications.
  • Excellent understanding / noledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in tuning and troubleshooting performance issues inHadoopcluster.
  • Experienced in working wif Spark ecosystem using Spark - SQL and Scala queries on different data file formats like .txt, .csv etc.
  • Designing and creating Hive external tables using shared meta-store instead of teh derby wif partitioning, dynamic partitioning and buckets
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions wif control flows.
  • Experience in integrating Hive and HBase for effective operations.
  • Developed teh Pig UDF'S to pre-process teh data for analysis.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop.
  • Strong understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB, Redis, Neo4j.
  • Experience in working on CQL (Cassandra Query Language), for retrieving teh data present in Cassandra cluster by running queries in CQL.
  • Proficient wif Cluster management and configuring Cassandra Database.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
  • Have good experience in creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
  • Working noledge on major Hadoop ecosystems PIG, HIVE, Sqoop, and Flume.
  • Good experience in Cloudera, Hortonworks& ApacheHadoopdistributions.
  • Knowledge on AWS(Amazon EC2) Hadoop distribution.
  • Developed high-throughput streaming apps reading from Kafka queues and writing enriched data back to outbound Kafka queues.
  • Wrote and worked on complex performance improvements on PL/SQL queries, stored procedures, triggers, indexes wif databases like MySQL and Oracle.
  • Also, working towards improvement of noledge on No-SQL databases like MongoDB.
  • Experience on NoSQL databases including HBase, Cassandra.
  • Hands-on experience in scripting skills in Python, Linux and UNIX Shell.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Knowledge on creating Solr collection configuration to scale up teh infrastructure.
  • Experience in developing web-based applications using Python.
  • Experience in application development using Java, J2EE, EJB, Hibernate, JDBC, Jakarta Struts, JSP and Servlets.
  • Experience in using various IDEs Eclipse, My Eclipse and repositories SVN and CVS.
  • Experience of using build tools Ant and Maven.
  • Working wif relative ease wif different working strategies like Agile, Waterfall and Scrum methodologies.
  • Excellent communication and analytical skills and flexible to adapt to evolving technology.
  • Impeccable Communication and analytical skills.


Languages: C, Python, Java, SQL, Scala, UML, XML

Hadoop Ecosystem: MapReduce, Spark, Hive, Pig, Sqoop, Flume.

Databases: Oracle 10g/11g, SQL Server, MYSQL

No SQL: HBase, Cassandra, MongoDB

Application / Web Servers: Apache Tomcat, JBoss, Mongrel, Web Logic, Web Sphere

Web Services: SOAP, REST

Operating systems: Windows, Unix/Linux

Microsoft Products: MS office, MS Visio, MS Project

Frameworks: Spring, Hibernate, Struts


Confidential, Chicago, IL

Sr. Hadoop/Spark Developer

Roles &Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for managing and scheduling Jobs on a Hadoop cluster.
  • Loading data from UNIX file system to HDFS and vice versa.
  • Improving teh performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked wif Apache Spark for large data processing integrated wif functional programming language Scala.
  • Developed POC using Scala, Spark SQL and MLlib libraries along wif Kafka and other tools as per requirement tan deployed on teh Yarn cluster.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in teh form of Data Frame and save teh data as Parquet format in HDFS.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Configured Spark Streaming to receive real time data and store teh stream data to HDFS.
  • Developed Spark scripts by using Scala shell commands as per teh requirement
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Documented teh requirements including teh available code which should be implemented using Spark, Hive, HDFS and SOLR.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Kafka Streams to Configure Spark streaming to get information and tan store it in HDFS.
  • Developed multiple Kafka Producers and Consumers as per teh software requirement specifications.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in teh form of Data Frame and save teh data as Parquet format in HDFS.
  • Real time streaming teh data using Spark wif Kafka.
  • Responsible for creating Hive tables and working on them using Hive QL.
  • Implementing various Hive UDF’s as per business requirements.
  • Exported teh analyzed data to teh databases using Sqoop for visualization and to generate reports for teh BI team.
  • Involved in Data Visualization using Tableau for Reporting from Hive Tables.
  • Developed Python Mapper and Reducer scripts and implemented them using Hadoop Streaming.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for writing Hive queries for data analysis to meet teh business requirements.
  • Customized Apache Solr to handle fallback searching and provide custom functions.
  • Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, HBase, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins.

Confidential, Philadelphia, PA

BigData Developer

Roles & Responsibilities:

  • Involved in Automation of click stream data collection and store into HDFS using Flume.
  • Involved in creating Data Lake by extracting customer's data from various data sources into HDFS.
  • Used Sqoop to load data from Oracle Database into Hive.
  • Developed MapReduce programs to cleanse teh data in HDFS obtained from multiple data sources.
  • Implemented various Pig UDF’s for converting unstructured data into structured data.
  • Developed Pig Latin scripts for data processing.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Load teh data into Spark RDD and performed in-memory data computation to generate teh output response.
  • Developed teh Apache Spark, Flume, and HDFS integration project to do a real-time data analysis
  • Developed data pipeline using Flume, Spark and Hive to ingest, transform and analyzing data
  • Wrote Flume configuration files for importing streaming log data into MongoDB wif Flume
  • Performed masking on customer sensitive data using Flume interceptors.
  • Used IMPALA to analyze data ingested into Hive tables and compute various metrics for reporting on teh dashboard.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Making code changes for a module in turbine simulation for processing across teh cluster using spark-submit.
  • Involved in performing teh analytics and visualization for teh data from teh logs and estimate teh error rate and study teh probability of future errors using regressing models.
  • Used WEB HDFS REST API to make teh HTTP GET, PUT, POST and DELETE requests from teh webserver to perform analytics on teh data lake.
  • Involved in creating Hive tables as per requirement defined wif appropriate static and dynamic partitions.
  • Used Hive to analyze teh data in HDFS to identify issues and behavioral patterns.
  • Involved in production Hadoop cluster set up, administration, maintenance, monitoring and support.
  • Logical implementation and interaction wif HBase.
  • Assisted in creation of large HBase tables using large set of data from various portfolios.
  • Cluster coordination services through Zookeeper.
  • Efficiently put and fetched data to/from HBase by writing MapReduce job.
  • Developed MapReduce jobs to automate transfer of data from/to HBase.
  • Assisted wif teh addition of Hadoop processing to teh IT infrastructure.
  • Used flume to collect teh entire web log from teh online ad-servers and push into HDFS.
  • Implemented custom business logic by writing UDF’s in Java and used various UDF’s from Piggybank and other sources.
  • Implemented MapReduce job and execute teh MapReduce job to process teh log data from teh ad-servers.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Performing analysis using high level languages like Python.
  • Launching Amazon EC2 cloud instances using Amazon images and configuring launched instances wif respect to specific applications.
  • Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs to build and letother groups build dashboards.

Environment: Hadoop, Pig, Sqoop, Oozie, MapReduce, HDFS, Hive,Java, Python, Eclipse, HBase, Flume, AWS, Oracle 10g, UNIX Shell Scripting, GitHub, Maven.

Confidential, Austin, TX

Hadoop Developer/Administrator

Roles & Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
  • Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
  • Build/Tune/Maintain Hive QL and Pig Scripts for user reporting.
  • Developed MapReduce Programs.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using SQOOP and automated teh SQOOP jobs by scheduling in Oozie.
  • Create Hive scripts to load data from one stage into another and implemented incremental load wif teh changed data architecture.
  • Teh Hive tables are created as per requirement were Internal or External tables defined wif appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Performed data analysis, queries on hive, pig on AMBARI(Hortonworks).
  • Enhanced Hive performance by implementing Optimizing and Compressing Techniques.
  • Implemented Hive partitioning and bucketing to improve query performance in teh Staging layer which is de-normalized form of teh Analytics Model.
  • Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
  • Managing and reviewing Hadoop log files.
  • Supported MapReduce Programs running on teh cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive.
  • Involved in creating Hive tables, loading data, and writing Hive queries
  • Develop Shell scripts to automate routine DBA tasks (i.e. database refresh, backups, monitoring)
  • Tuned/Modified SQL for batch and online processes
  • Wrote MapReduce programs.
  • Defining workflow using Oozie framework for automation.
  • Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
  • Responsible for reviewing Hadoop log files.
  • Loading and transforming large sets of unstructured and semi structured data.
  • Performed data completeness, correctness, data transformation and data quality testing using SQL.
  • Written shell scripts to retrieve information from files.
  • Implementation of Hive partition (static and dynamic) and bucketing.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
  • Assisted in creation of ETL processes for transformation of data sources from existing RDBMS systems.
  • Written teh Apache PIG scripts to process teh HDFS data.
  • Wrote Hive queries for data analysis to meet teh business requirements.
  • Involved in installing Hadoop Ecosystem components.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Installed and configured Pig, Hive and Sqoop.

Environment: Core Java 5, JSP, Struts, HTML, CSS, XML, JavaScript, Oracle 10g, PL/SQL, Database Objects like Stored Procedures, Packages. Rational Application Developer7, Windows 7, WebSphere Application Server 7, Oracle SQL Developer, Maven, TOAD, Putty.

Well Care, Tampa, FL

Java/Hadoop Developer


  • Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API.
  • Designed and implemented Java engine and API to perform direct calls from front-end JavaScript (ExtJS) to server-side Java methods (ExtDirect).
  • Used Spring AOP to implement Distributed declarative transaction throughout teh application.
  • Designed and developed Java batch programs in Spring Batch.
  • Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
  • Developed Map-Reduce programs to get rid of irregularities and aggregate teh data.
  • Implemented Hive UDF's and did performance tuning for better results
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
  • Implemented optimized map joins to get data from different sources to perform cleaning operations before applying teh algorithms.
  • Experience in using Sqoop to import and export teh data from Oracle DB into HDFS and HIVE.
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
  • Used various compression codecs to effectively compress teh data in HDFS.
  • Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF's involving date functions.
  • Responsible for troubleshooting issues in teh execution of MapReduce jobs by inspecting and reviewing log files.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Installed and configured Pig and wrote Pig Latin scripts.
  • Created and maintained Technical documentation for launching ClouderaHadoop Clusters and for executing Hive queries and Pig Scripts.
  • Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
  • Done teh work in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Involved in loading data from UNIX file system to HDFS.
  • Created java operators to process data using DAG streams and load data to HDFS.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Involved in Develop monitoring and performance metrics forHadoop clusters.
  • Continuous monitoring and managing theHadoopcluster through Cloudera Manager.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.


Java/J2EE Developer

Roles &Responsibilities:

  • Involved in writing programs for XA transaction management on multiple databases of teh application.
  • Developed java programs, JSP pages and servlets using Cantata Struts framework.
  • Involved in creating database tables, writing complex TSQL queries and stored procedures in teh SQL server.
  • Worked wif AJAX framework to get teh asynchronous response for teh user request and used JavaScript for teh validation.
  • Used EJBs in teh application and developed Session beans to implement business logic at teh middle tier level.
  • Actively involved in writing SQL using SQL Query Builder.
  • Involved in coordinating teh on-shore/Off-shore development and mentoring teh new team members.
  • Extensively Used Ant tool to build and configure J2EE applications and used Log4J for logging in teh application
  • Used JAXB to read and manipulate teh xml properties.
  • Used JNI for calling teh libraries and other implemented functions in C language.
  • Used prototype MooTools and script.aculo.us for fluid User Interface.
  • Involved in fixing defects and unit testing wif test cases using JUnit.

Environment: Java, EJB, Servlets, XSLT, CVS, J2EE, AJAX, Struts, Hibernate, ANT, Tomcat, JMS, UML, Log4J, Oracle 10g, Eclipse, Solaris, JUnit and Windows 7/XP, Maven.


Java Developer

Roles &Responsibilities:

  • Played an active role in teh team by interacting wif business and program specialists and converted business requirements into system requirements.
  • Conducted Design reviews and Technical reviews wif other project stakeholders.
  • Implemented Services using Core Java.
  • Involved in development of classes using java.
  • Good proficiency in developing algorithms for serial interfaces.
  • Involved in testing of CAN protocols.
  • Developed teh flow of algorithm in UML.
  • Used Servlets to implement Business components.
  • Designed and Developed required Manager Classes for database operations
  • Developed various Servlets for monitoring teh application.
  • Designed and developed teh front end using HTML and JSP
  • Developed XML files, DTDs, Schema's and parsing XML by using both SAX and DOM parser.
  • Wrote deployment descriptors using XML and Test java classes for a direct testing of teh Session and Entity beans.
  • Did Packaging and Deployment of builds through ANT script.
  • Wrote stored procedure and used JAVA APIs to call these procedures.
  • Database designing that includes defining tables, views, constraints, triggers, sequences, index, and stored procedures.
  • Developed verification and validation scripts in java.
  • Followed verification and validation cycle for development of algorithms.
  • Developed Test cases for Unit Test cases and as well as System and User test scenarios.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: Java, JSP, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE, Windows 7/XP/Vista, UNIX, LINUX.

Hire Now