We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Denver, CO


  • Overall 7 + years of IT experience in software development, which includes hands on experience in Big Data Engineering and Analytics, Java Application Development.
  • Expertise with the tools in Hadoop Ecosystem including Spark, Hive, HDFS, MapReduce, Sqoop, Pig, Kafka, Yarn, Oozie, and Zookeeper.
  • Strong programming experience using Java, Scala, Python and SQL.
  • Strong fundamental understanding of Distributed Systems Architecture and parallel processing frameworks.
  • Strong experience designing and implementing end - to-end data pipelines running on terabytes of data.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
  • Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
  • Experience in using D-Streams in spark streaming, accumulators, Broadcast variables, various levels of caching and optimization techniques in spark.
  • Strong experience working with data ingestion tools Sqoop and Kafka.
  • Good knowledge and development experience with using MapReduce framework.
  • Hands on experience in writing ad-hoc Queries for moving data from HDFS to Hive and analyzing data using Hive QL.
  • Proficient in creating Hive DDL's, writing Hive custom UDF’s.
  • Knowledge in job workflow managing and monitoring tools like Oozie and Rundeck.
  • Experience in designing, implementing and managing secure authentication mechanism to Hadoop cluster with Kerberos.
  • Experience in working with NoSQL database like HBase, Cassandra and Mongo DB.
  • Experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • Good knowledge in creating ETL jobs through Talend to load huge volumes of data into Hadoop Ecosystem and relational databases.
  • Experience working with Cloudera, Hortonworks and Amazon AWS EMR distributions.
  • Good experience in developing applications using Java, J2EE, JSP, MVC, EJB, JMS, JSF, Hibernate, AJAX and web-based development tools.
  • Strong experience in RDBMS technologies like MySQL, Oracle, Snowflake, Redshift and Teradata.
  • Strong expertise in creating Shell-Scripts, Regular Expressions and Cron Job Automation.
  • Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
  • Experience working with containerization engines like Docker, Kubernettes.
  • Experience with various version control systems such as CVS, TFS, SVN.
  • Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.


BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark(Scala & Python), Kafka and Oozie

No-SQL Databases: HBase, Cassandra, MongoDB

Languages: Java, Scala, Python, SQL

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Microsoft SQL Server, MySQL, Oracle, DB2

Build and Version Tools: Jenkins, Maven, Git

Development Tools: Eclipse, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall


Confidential, Denver, CO

Sr. Hadoop/Spark Developer


  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3, Redshift, Snowflake.
  • Involved in creating Hive tables, loading and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Good experience with continuous Integration of application using Bamboo.
  • Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.
  • Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Hadoop, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, Amazon EMR, YARN, JIRA, Amazon AWS, Shell Scripting, SBT, GITHUB, Maven.

Confidential, Hartford, CT

Hadoop Developer


  • Involved in requirement analysis, design, coding and implementation phases of the project.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Written new spark jobs in Scala to analyze the data of the customers and sales history.
  • Used Kafka to get data from many streaming sources into HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
  • Created Hive external tables to perform ETL on data that is generated on daily basics.
  • Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Created Sqoop jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
  • Loaded the data into hive tables from spark and used parquet columnar format.
  • Developed oozie workflows to automate and product ionize the data pipelines.
  • Developed Sqoop import Scripts for importing data from Netezza.

Environment: Hadoop, HDFS, Hive, Sqoop, Kafka, Spark, Shell Scripting, HBase, Scala, Python, Kerberos, Maven, Ambari, Hortonworks, MySQL.

Confidential, New York City, NY

Big Data/Hadoop Developer


  • Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Managing and reviewing data backups & log files.
  • Responsible to manage the test data coming from different sources.
  • Analyzed data using Hadoop components Hive and Pig.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible for creating Hive tables, loading data and writing hive queries.
  • Created Hive External tables and loaded the data into tables and query data using HQL
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Experience in Monitoring System Metrics and logs for any problems adding, removing, or updating Hadoop Cluster.
  • Involved in scheduling Oozie workflow engine to run multiple Hives and pig jobs and used Oozie workflows for batch processing and scheduling workflows dynamically
  • Involved in requirement analysis, design, coding and implementation phases of the project.

Environment: Hadoop, Spark, Scala 1.5.2, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Zookeeper

Confidential, Camp Hill, PA

Hadoop Developer


  • Developed complex MapReduce jobs in Java to perform data extraction, aggregation and transformation
  • Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
  • Analyzed big data sets by running Hive queries and Pig scripts.
  • Integrated the hive warehouse with HBase for information sharing among teams.
  • Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
  • Worked on Static and Dynamic partitioning and Bucketing in Hive.
  • Scripted complex Hive QL queries on Hive tables for analytical functions.
  • Developed complex Hive UDFs to work with sequence files.
  • Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
  • Created dashboards in Tableau to create meaningful metrics for decision making.
  • Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Implemented Log4j to trace logs and to track information.
  • Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux.

Java/J2EE Developer



  • Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Applied OOAD principle for the analysis and design of the system.
  • Implemented XML Schema as part of XQuery query language
  • Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.
  • Used RAD for the Development, Testing and Debugging of the application.
  • Used WebSphere Application Server to deploy the build.
  • Developed front-end screens using Struts, JSP, HTML, AJAX, jQuery, Java script, JSON and CSS.
  • Used J2EE for the development of business layer services.
  • Developed Struts Action Forms, Action classes and performed action mapping using Struts.
  • Performed data validation in Struts Form beans and Action Classes.
  • Developed POJO based programming model using spring framework.
  • Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
  • Used Web Services to connect to mainframe for the validation of the data.
  • SOAP has been used as a protocol to send request and response in the form of XML messages.
  • JDBC framework has been used to connect the application with the Database.
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Log4j framework has been used for logging debug, info & error data.
  • Used Hibernate framework for Entity Relational Mapping.
  • Used Oracle 10g database for data persistence and SQL Developer was used as a database client.
  • Extensively worked on Windows and UNIX operating systems.
  • Used SecureCRT to transfer file from local system to UNIX system.
  • Performed Test Driven Development (TDD) using JUnit.
  • Used Ant script for build automation.
  • PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.
  • Used Rational Clear quest for defect logging and issue tracking.

Environment: Windows XP, Unix, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, WebSphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON,SOAP, WSDL, XML, Eclipse, Agile, Jira, Oracle 10g, Win SCP, Log4J, JUnit.

Hire Now