We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Plano, TX


  • 7 years of professional experience in Requirements Analysis, Design, Development and Implementation of Java, J2EE and Big Data technologies.
  • 4+ years of exclusive experience in Big Data technologies andHadoopecosystem components likeSpark, MapReduce, Hive, Pig, YARN, HDFS, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
  • Strong Knowledge on Architecture of Distributed systems and Parallel processing, In - depth understanding of MapReduce Framework andSparkexecution framework.
  • Expertise in writing end to end Data Processing Jobs to analyze data using MapReduce,Sparkand Hive.
  • Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
  • Experience using variousHadoopDistributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage newHadoopfeatures.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Extensive experience in importing/exporting data from/to RDBMS theHadoopEcosystem using Apache Sqoop.
  • Worked on Java HBase API for ingestion processed data to HBase tables
  • Strong experience in working with UNIX/LINUX environments, writing shell scripts.
  • Good knowledge and experience of Real time streaming technologiesSparkand Kafka.
  • Experience in optimization of MapReduce algorithm using Combiners and Practitioners' to deliver the best results.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
  • Sound knowledge of J2EE architecture, design patterns, objects modeling using various J2EE technologies and frameworks.
  • Adept at creating Unified Modeling Language (UML) diagrams such as Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Rational Rose and Microsoft Visio.
  • Extensive experience in developing applications using Java, JSP, Servlets, JavaBeans, JSTL, JSP Custom Tag Libraries, JDBC, JNDI, SQL, AJAX, JavaScript and XML.
  • Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
  • Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
  • Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
  • Experience in writing test cases in Java Environment using JUnit.
  • Hands on experience in development of logging standards and mechanism based on Log4j.
  • Experience in building, deploying and integrating applications with ANT, Maven.
  • Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
  • Demonstrated technical expertise, organization and client service skills in various projects undertaken.


Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka

Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans

Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)

Web Technologies: JavaScript, AJAX, HTML, XML and CSS.

Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting

IDE: Eclipse, NetBeans, pyCharms

Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAI

Build Management tools: Maven, Apache ANT, SOAP, REST

Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.

Scheduling Tools: Cron tab, Autosys, Ctrl M

Visualization Tools: Tableau, Arcadia Data.


Confidential, Plano, TX

Hadoop/Spark Developer


  • Expertise in designing and deployment ofHadoopcluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume,Spark, Impala, Cassandra with Hortonworks Distribution.
  • InstalledHadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre - processing.
  • Assisted in upgrading, configuration and maintenance of variousHadoopinfrastructures like Pig, Hive, and Hbase.
  • UsedSparkAPI over HortonworksHadoopYARN to perform analytics on data in Hive.
  • Exploring with theSparkimproving the performance and optimization of the existing algorithms inHadoopusingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
  • DevelopedSparkcode using scala andSpark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/Hbase intoSparkRDD.
  • POC on Single Member Debug on Hive/Hbase andSpark.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Load the data intoSparkRDD and do in memory data Computation to generate the Output response.
  • Loading Data into Hbase using Bulk Load and Non-bulk load.
  • Experience in Oozie and workflow scheduler to managehadoopjobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Expertise in different data Modeling and Data Warehouse design and development.

Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Kfaka, Solr, HBase, Oozie, Flume,Spark- Streaming/SQL, java, SQL Scripting, Linux Shell Scripting.

Confidential, Phoenix, AZ



  • Installed and configuredHadoopEnvironment.
  • Developed multiple Map - Reduce jobs in java for data cleaning and preprocessing.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Used pig and map reduce to analyze XML files and log files.
  • Imported data using Sqoop to load data from IBM DB2 to HDFS on regular basis.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL.
  • Importing and exporting data into HDFS and Hive using Sqoop from IBM DB2, Netezza Databases.
  • Used Oozie workflow to co-ordinate pig and hive scripts.
  • Used Impala for querying HDFS data to achieve better performance.
  • Designed and implemented Map-Reduce based large-scale parallel relation-learning system.
  • Setup and benchmarkedHadoop/Hbase clusters for internal use.
  • Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
  • Developed Map Reduce program to convert mainframe fixed length data to delimited data.
  • Data ingestion from various IBM DB2 tables to HDFS using Sqoop.
  • Automated Python scripts to pull and synchronize the code in GitHub environment.

Environment: Hadoop, CDH, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Impala, Hbase, Oracle, Map R AutoSys, Mainframes, JCL, IBM DB2, NDM.

Confidential, Columbus, OH



  • Responsible for business logic using java and JavaScript, JDBC for querying database.
  • Involved in requirement analysis, design, coding and implementation.
  • Worked in Agile Methodology and used JIRA for maintain the stories about project.
  • Analyzed large data sets by running Hive queries.
  • Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
  • Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results fromHadoopto downstream systems.
  • Used Sqoop to import and export the data fromHadoopDistributed File System (HDFS) to RDBMS.
  • Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
  • Established custom Map Reduces programs in order to analyze data and used HQL queries to clean unwanted data.
  • Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Involved in writing complex queries to perform join operations between multiple tables.
  • Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
  • Developing Scripts and Scheduled Autosy's Jobs to filter the data.
  • Involved monitoring Auto Sys's file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
  • Created and maintained Technical documentation for launchingHadoopClusters and for executing Hive queries and Pig Scripts
  • Used IMPALA to pull the data from Hive tables.
  • Used Apache Maven 3.x to build and deploy application to various environments Installed Oozie workflow engine to run multiple Hive jobs which run independently with time and data availabilities

Environment: HDFS,Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys

Hire Now