We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Mo, UsA


  • Possess 8 years of IT experience in software design and development wif strong background in Bigdata Analytics, Hadoop, Spark and NOSQL Databases. Well - versed in all Hadoop ecosystem components like Pig, Hive, Sqoop, Flume, Kafka and HBase. Worked wif all kinds of data - structured, semi-structured and unstructured. Experienced in working wif Spark RDDs and Data Frames to enable in-memory data processing by executing appropriate actions and transformations. Skilled at cleaning, processing, analysis, visualization and predictive modeling of data.
  • Experienced in building highly scalable Big-data solutions using Hadoop multiple distributions me.e., Cloudera, Hortonworks and NoSQL platforms (HBase, Mongo DB & Cassandra).
  • Experience in Software development life cycle (SDLC) for various applications including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Kafka, Spark, Impala, Cassandra in both Cloudera and Hortonworks environments.
  • Experience in Development, analysis and design of ETL methodologies in all teh phases of Data Warehousing life cycle.
  • More than one year of hands on experience using Spark framework wif Scala.
  • Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
  • Experience in developing Map Reduce jobs in Java for data cleaning, transformations, pre-processing and analysis.
  • Good understanding of Machine Learning, Data Mining and Algorithms.
  • Analyzing Streaming data and identifying important trends in data for further analysis using Spark Streaming.
  • Experienced in ETL methodology for performing Data Migration, Data Profiling, Extraction, Transformation and Loading using Talendand designed data conversions from large variety of source systems including Oracle DB2, SQL server, Hive, and non-relational sources like flat files, XML and Mainframe files.
  • End to end experience in designing and data visualizations using Tableau.
  • Designed, developed and implemented several List, Chart and Crosstab, master-detail reports and TableauBusiness Intelligence reports.
  • Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, Spring, JSP, Servlets, EJB, JDBC, JavaScript, XML and HTML.
  • Hands on experience spinning up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, JDBC and SQL.
  • Familiar wif Java virtual machine (JVM) and multi-threaded processing.
  • Strong noledge in data modelling, effort estimation, ETL Design, development, system testing, implementation and production support. Experience in resolving on-going maintenance issues and bug fixes.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

Technical Skills:

Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Impala, Apache Spark,Spark Streaming, Spark-SQL, Hue, Ambari, Windows Azure

Programming Languages: Java, C, SCALA, SQL, PL/SQL, PIG-Latin, HQL

Databases: Oracle, MySQL, PostgreSQL, MongoDB, HBase, Cassandra

IDE Tools: Eclipse, IntelliJ

Framework: Hibernate, Spring, Struts, Junit

Web Technologies: HTML5, CSS3, JavaScript, JSP, Servlets, JNDI, JDBC, Java Beans

Web Services: SOAP, REST, WSDL, JAXB, and JAXP

Reporting Tools /ETL Tools: Tableau, Microsoft Power BI


Hadoop/Spark Developer

Confidential, MO, USA


  • Involved in various stages of project data flow such as control validation, data quality and change data capture.
  • Performed data mining tasks depending on business scenarios.
  • Experience wif Cloudera distribution of Hadoop (CDH 5.10).
  • Experience wif configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Yarn.
  • Experienced in teh entire Software development life cycle (SDLC) in teh project including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Wrote SQL stored procedure in Hue to access teh data from Hive.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked wif Apache Spark.
  • Loaded teh data into Spark RDD and performed in-memory data computation to generate teh output response.
  • Created Hive tablesintegrated them as per teh design using parquet file format.
  • Handled Delta processing or incremental updates using Hive.
  • Executed Dynamic Partitioning in Hive to segregate customer database based on age
  • Designed and developed Pig Latin scripts and pig command line transformations for data.
  • Used Kafka for logging teh status of various jobs.
  • Involved in writing various joins in MySQL depending on client requirement.
  • Developed Hive scripts for analyst requirements for analysis.
  • Stored data in hive and enabled end users to access through Impala.
  • Exported data from RDBMS to HDFS and vice versa using Sqoop.
  • Created partitioned and bucketed tables in Hive based on teh hierarchy of teh dataset.
  • Created several UDFs in Pig and Hive to give additional support for teh project.
  • Used Hive data warehouse tool to analyze teh data in HDFS and developed Hive queries.
  • Good understanding on Spark SQL, Spark Transformation Engine and Spark Streaming.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in teh hive and Map Side joins.
  • Involved in cluster maintenance and monitoring.
  • Has experience in Scala programming language and used it extensively wif Apache Spark for data processing.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

Environment:Map Reduce, Cloudera Manager 5.10, HDFS, Hive, Spark 1.6, Kafka, Scala, MySQL, Java (JDK 1.6), Eclipse.

Hadoop Developer

Confidential, CA, USA


  • Responsible for running Hadoop streaming jobs to process terabytes of xml format data.
  • Created Hive tables, loading wif data and writing hive queries which will run internally in map reduce way.
  • Optimized Hive joins for large tables and developed map reduce code for full outer join of two large tables.
  • Designed and developed Pig Latin scripts and pig command line transformations for data joins and custom processing of map reduce outputs.
  • Developed Spark scripts by using Scala shell commands as per teh requirement.
  • Developed MapReduce jobs in Java API to parse teh raw data and store teh refined data.
  • Created HBase tables for random read/writes by map reduce programs.
  • Loaded teh data into Spark RDD and performed in-memory data computation to generate teh output response.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts wif understanding of joins, groups and aggregations.
  • Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Involved in NoSQL databases like HBase, Cassandra in implementing and integration.
  • Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
  • Moved data from Hadoop to Cassandra using Bulk output format class.
  • Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop. Cluster co-ordination through Zookeeper.
  • Implemented Cloudera Manager on existing cluster.
  • Extensively worked wif Cloudera Distribution of Hadoop, CDH 5.x.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Integrated Apache Storm wif Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating wif Storm.
  • Responsible for developing, support and maintenance for teh ETL (Extract, Transform and Load) processes using Talend.
  • Developed teh Talendjobs and make sure to load teh data into HIVE tables &HDFS files.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Impala, Cassandra, Kafka, SQL, Python, Spark, Linux, Java.

Hadoop Developer



  • Worked on writing transformations/mapping Map-Reduce pipelines using Java.
  • Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Involved in creatingHive Tables, loading wif data and writing Hive queries which will invoke and run Map Reduce jobs in teh backend.
  • Worked wif various HDFS file formats like Avro, Sequence File, JSON and various compression formats like Snappy, bzip2.
  • Designed and implemented Incremental Imports into Hive tables.
  • Wrote Hive jobs to parse teh logs and structure them in tabular format to facilitate effective querying on teh log data.
  • Involved in loading data into HBase using HBase Shell, HBase Client API.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Loaded data in HDFS to Spark RDD’s and performed various transformations, actions to process teh data.
  • Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
  • Migrated ETL jobs to Pig scripts implementing Transformations, even joins and some pre-aggregations before storing teh data onto HDFS.
  • Implemented teh workflows using Apache Oozie framework to automate tasks.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all teh clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Developed small distributed applications in our projects using Zookeeper and scheduled teh workflows using Oozie.
  • Provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, PL/SQL, Windows NT, UNIX Shell Scripting.

Java Developer



  • Implemented teh Struts framework wif MVC architecture.
  • Created new JSP's for teh front end using HTML,JavaScript, jQuery, and Ajax.
  • Developed teh presentation layer using JSP, HTML, CSS and client-side validations using JavaScript.
  • Collaborated wif teh ETL/ Informatica team to determine teh necessary data models and UI designs to support Cognos reports.
  • Performed several data quality checks and found potential issues, designed Ab Initio graphs to resolve them.
  • Applied J2EE design patterns like Business Delegate, DAO and Singleton.
  • Deployed and tested teh application using Tomcat web server.
  • Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
  • JBOSS for application deployment and MySQL for database
  • JUnit was used for unit testing for teh integration testing tool.
  • Used Oracle coherence for real-time cache updates, live event processing, in-memory grid computations.
  • Writing SQL queries to fetch teh business data using Oracle as database.
  • Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components
  • Log4j used for logging teh application log of teh running system to trace teh errors and certain automated routine functions.
  • Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, jQuery and CSS.
  • Creating custom tags for JSP for maximum re-usability of user interface components.
  • Testing and deploying teh application on Tomcat.

Environment:Java, JSP, Hibernate, Junit, JavaScript, Servlets, Struts, Hibernate, EJB, JSF, JSP, Ant, Tomcat, CVS, Eclipse, SQLDeveloper, Oracle.

Java Developer



  • Design and developed Web services based on SOAP, WSDL, JAXWS using Spring.
  • Involved in designing teh xml schema for using web services.
  • Implemented service layer using Spring IOC, annotations and Controllers using Spring MVC
  • Designed and developed Data layer for Client communicating both Oracle and Sybase at any time.
  • Designed class diagrams and sequence diagrams using Microsoft Visio 2007.
  • Migrated complex queries and stored procedures from Sybase to Oracle.
  • Developed and debugged applications using Eclipse IDE.
  • Used Oracle coherence for real-time cache updates, live event processing, in-memory grid computations
  • Developed tools to generate automated send views and ret views to serialize teh data to teh mainframe.
  • Set up Custom business validations using Struts validate framework.
  • Involved in JUNIT testing, integration testing, system testing etc.
  • Developed and deployed Message Driven Beans to apply same adjustment for multiple air bills asynchronously.
  • Used multithreading on teh client to process huge requests.
  • Implemented all teh functionality using Spring IO/Spring Boot and Hibernate ORM.
  • Created different state machines to accomplish teh dependent tasks individually one after another.
  • Used ASN encoding to send teh data across teh network and used MIIG API to talk to mainframe server.
  • Migrated functionalities developed in C (procedure Oriented Language) code to JAVA wifout missing business rules.
  • Created JSP page to modify log levels dynamically wifout restarting teh server.
  • Involved in creating automated builds using ANT for teh client and Maven to build/deploy onto WebLogic server.

Environment: Java, J2EE, SOAP, XML, XSD, WSDL, JAXB, SOAP UI, MDB, ANT/Maven, JUnit, SVN, Eclipse, Oracle, PL/SQL, WebLogic.

Hire Now