We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Lexington, KY


  • Result - driven IT Professional with 8+ years of professional experience that includes 5 years of expertise on BigData Systems and Data Analytics, Development and Design of Java based enterprise applications.
  • Excellent knowledge on Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Flume, Kafka, Oozie and HBase .
  • Hands on experience in programming using Java, Scala and Python.
  • Sound knowledge of architecture of Distributed Systems and parallel processing frameworks .
  • Good experience on fine tuning spark applications to improve performance and troubleshooting failures in spark applications.
  • Highly skilled in designing and implementing end-to-end data pipelines to processes and analyze massive amounts of data.
  • Expertise on various Hadoop distributions primarily Cloudera (CDH), Hortonworks (HDP) and Amazon EMR .
  • Experience in developing production ready spark application using Spark RDD, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's.
  • Strong experience in using D-Streams for spark streaming, accumulators, broadcast variables, different levels of caching and optimization techniques for spark jobs
  • Proficient in importing/exporting data from RDBMS to HDFS using Sqoop.
  • Strong knowledge and hands on experience in developing MapReduce jobs.
  • Well versed with writing Hive DDL's, developing customized UDF’s in Hive.
  • Experience in transferring streaming data from different data sources into HDFS and HBase using Apache Kafka and Flume
  • Experience in using Oozie schedulers and Unix Scripting to automate end to end data workflows.
  • Strong knowledge of NoSQL databases and worked with HBase, Cassandra and Mongo DB.
  • Good understanding of Hadoop Gen1/Gen2 architecture, YARN architecture and its daemons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm.
  • Experience in working with cloud services such as EMR, S3, EC2, Redshift, Athena.
  • Expert in SQL extensively worked RDBMS s like Oracle, SQL Server, DB2, MySQL and Teradata
  • Proficient and Worked with GIT, Jenkins and Maven .
  • Strong understanding of NOSQL Databases like Cassandra, MongoDB, and HBASE
  • Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
  • Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle ( SDLC ).
  • Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.


Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Scala, Flume, Zookeeper, Oozie

Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, hibernate 3.0, Spring 3.x, Structs

NoSQL Databases: HBase, Cassandra, MongoDB

AWS technologies: Data Pipeline, Redshift, EMR

Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Web/Application: Servers Web logic, Web Sphere, JBoss, Tomcat

Operating Systems: UNIX, Windows, Mac, LINUX

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS

Business Intelligent tools: Tableau, Splunk, QlikView

Development Methodologies: Agile, V-Model, Waterfall Model, Scrum


Confidential - Lexington, KY

Sr. Hadoop/Spark Developer

Roles & Responsibilities:

  • Ingested click stream data from FTP servers to S3 buckets on daily basis using customized Input Adapters.
  • Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
  • Developed various spark applications using Scala to perform various enrichment of these click stream data merged with user profile data.
  • Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
  • Trouble Shooting Spark applications for improved error tolerance.
  • Worked extensively on sizing the spark executors for efficient and optimal usage memory across the spark jobs.
  • Worked on fine tuning spark jobs to improve overall job performance.
  • Utilized Spark Scala API to implement batch processing of jobs.
  • Developed Kafka producer API to send live-stream data into various Kafka topics.
  • Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
  • Utilized Spark in Memory capabilities, to handle large datasets.
  • Used B roadcast variables in Spark, effective & efficient Joins, transformations and other capabilities for data processing.
  • Utilized Spark-SQL to event enrichment and used Spark-SQL to prepare various levels of user behavior summaries.
  • Explored machine learning techniques like linear regression and clustering using Spark-ML.
  • Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE .
  • Worked extensively on AWS EMR, Athena, Glue Metastore and Redshift
  • Involved in continuous Integration of application using Jenkins.
  • Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability

Environment: AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, HBase, Scala, MapReduce

Confidential - Omaha, NE

Spark Developer

Roles & Responsibilities:

  • Worked on migrating data from traditional RDBMS to HDFS.
  • Ingested data into HDFS from Teradata, MySQL using Sqoop.
  • Involved in developing spark application to perform ETL kind of operations on the data.
  • Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Data frames and Spark SQL API’s
  • Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables
  • Involved in creating Hive external tables to perform ETL on data that is produced on daily basis
  • Validated the data being ingested into HIVE for further filtering and cleansing.
  • Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
  • Loaded the data into hive tables from spark and used Parquet columnar format.
  • Created Oozie workflows to automate and productionize the data pipelines
  • Migrating Map Reduce code into Spark transformations using Spark and Scala.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Tableau to connect to Impala for developing interactive dashboards.
  • Designed, documented operational problems by following standards and procedures using JIRA­­­

Environment: Cloudera Hadoop, Spark, Scala, Sqoop, Oozie, Hive, Cent OS, MySQL, Oracle DB, Flume

Confidential - San Mateo, CA

Hadoop Developer

Roles & Responsibilities:

  • Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Load the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
  • Developed data pipelines using Spark, Hive and Sqoop to ingest data from data warehouse, transform and analyze operational data.
  • Developed Spark jobs, Hive jobs to summarize and transform data.
  • Worked on performance tuning of Spark application to improve performance.
  • Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
  • Worked on different file formats like Text, Sequence files, Avro, Parquet, JSON, XML files and Flat files using Map Reduce Programs.
  • Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop .
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MR jobs.
  • Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to derive business insights and solve client’s operational and strategic problems.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
  • Extensively worked with Partitions, Dynamic Partitioning, Bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
  • Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
  • Designed Oozie workflows for job scheduling and batch processing.

Environment: Java, Scala, Apache Spark, MySQL, CDH, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop, PIG, Flume, UNIX Shell Scripting, Python, Apache Kafka

Confidential - Boston, MA

Big Data/Hadoop Developer

Roles & Responsibilities:

  • Coordinated with business customers to gather business requirements and interacted with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
  • Involved in validating the aggregate table based on the rollup process documented in the data mapping.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS .
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra .
  • Aggregated data onto Oracle using sqoop for reporting on the Tableau dashboard.
  • Involved in application development using RDBMS, and Linux shell scripting.
  • Developed and updated social media analytics dashboards on regular basis.
  • Created a complete processing engine, based on Hortonworks distribution.
  • Manage and review Hadoop log files.
  • Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
  • Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Oracle 11g, HDFS, Eclipse

Confidential - Herndon, VA

Java Developer

Roles & Responsibilities:

  • Gathered requirements from end users and create functional requirements.
  • Contribute on process flow analyzing the functional requirements
  • Development of Graphical user interface for user self-service screen
  • Implemented four eyes principle and created quality check process -reusable across all workflow on overall platform level
  • Development of UI models using HTML, JSP, JavaScript, Web Link and CSS .
  • Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
  • Support in end user training, testing and documentation.
  • Implemented Backing beans for handling UI components and stores its state in a scope.
  • Created Server Side of application for project management using Node JS and Mongo DB
  • Worked on implementing EJB Stateless sessions for communicating with Controller.
  • Implemented database integration using Hibernate and utilized spring with Hibernate for mapping with Oracle database.
  • Worked on Oracle PL/SQL queries to Select, Update and Delete data.
  • Worked on MAVEN for build automation. Used GIT for version control

Environment: Java, J2EE, JSP, Maven, Linux, CSS, GIT Oracle, XML, Mongo DB, Node JS, SAX, Rational Rose, UML


Java Developer

Roles & Responsibilities:

  • Involved in developing the application using Java/J2EE platform. Implemented the Model View Control ( MVC ) structure using Struts.
  • Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
  • Used Spring Core Annotations for Dependency Injection.
  • Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
  • Responsible to write the different service classes and utility API which will be used across the frame work.
  • Used Axis to implementing Web Services for integration of different systems.
  • Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
  • Exposed various capabilities as Web Services using SOAP/WSDL .
  • Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
  • Used AJAX framework for server communication and seamless user experience.
  • Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
  • Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
  • Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
  • Used Log4j for the logging the output to the files.
  • Used JUnit/ Eclipse for the unit testing of various modules.
  • Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, Junit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML

Hire Now