We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume



  • Detailed Oriented Big Data Engineer/Hadoopdeveloper with around 4 years of total IT Experience.
  • Excellent knowledge on Apache Hadoop ecosystem components like Map - Reduce, Hive, Pig, SQOOP, Spark, Flume, HBase, Kafka, Oozie, Zookeeper, YARN programming paradigm.
  • Proficiency in export/importing data from/to Relational Database Management Systems (RDBMS).
  • Experience in Spark Streaming to ingest data from multiple data sources into HDFS.
  • Experience using PL/SQL to write Stored Procedures, Functions and Triggers in Oracle.
  • Experience in Partitioning tables, UDFs, Performance tuning, compression related properties in Hive.
  • Hands on Expertise in designing and developing applications in Spark using Scala to compare performance of Spark with Hive.
  • Strong knowledge on optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Proficient in using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Solid knowledge on data transformations using MapReduce, Hive for different file formats.
  • Excellent knowledge on AWS S3, EC2, Redshift and lambda.
  • Knowledge with various scripting language like java, and J2EE.
  • Knowledge in job work-flow scheduling and monitoring tools like Oozie scheduler.
  • Experience with various scripting languages like Linux/Unix shell scripts, Python.
  • Experience in using Sequence files, AVRO, Parquet file formats, Managing Hadoop log files.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Agile and Waterfall.
  • Excellent technical, communication, analytical, problem-solving and trouble-shooting capabilities.


Languages: Java, Scala, Python, R-studio, C++/C, SQL, PL/SQL, HiveQL, J2EE

Big Data Technologies: HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Kafka, YARN, Zookeeper, Hue, Flume, CDH 5.14, Oozie workflow

Database: Oracle 11g/10g, MS Access, MySQL, SQL Server, IBM DB2, No SQL (HBase, Cassandra)

Web Technologies/Tools: JavaScript, HTML5, CSS3, JSP, Servlets, JSON, XML, AWS S3, EC2

IDE: Eclipse, IntelliJ, SBT, Apache Tomcat, Net Beans, WebLogic, Jupyter Notebooks

Methodologies: Agile, Waterfall, Version Control: GIT, SVN

OS & Tools: Windows, UNIX, Linux,Putty, WinSCP, FileZilla, Power BI, Tableau, MAVEN


Hadoop/Spark Developer

Confidential, Ohio


  • Implemented various POC's (Proof of Concept) for evaluating the Big Data technologies.
  • Assisted in setting up the environment for continuous deployments.
  • Collaborated with the infrastructure, network, database, application, and BI teams to ensure the architecture would serve the right purpose.
  • Loaded the data using Sqoop from different RDBMS Servers like Oracle, MySQL, Mainframes to Hadoop HDFS Cluster.
  • Should have a deep understanding of Java and is expected to perform complex data transformations in Spark using Scala language .
  • Built Data Quality services for actively controlling the data ingested into the platform
  • Load and transform large sets of structured, semi structured and unstructured data. Used AVRO, Parquet file formats for serialization of data.
  • Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in the map-reduce pattern.
  • Developing the real-time processing of Wi-Fi data logs using Spark.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Developed unloading microservices using Scala API in Spark Dataframe API for the semantic layer.
  • Created data provisioning services for the client to query over the large datasets.
  • Developed logger service using Kafka and Spark streaming which would log the data real time.
  • Built data governance policy, processes, procedures, and control for Data Platform.
  • Collaborated with different projects and assisted them to use the Data platform services.

Environment: Cloud era, Spark, Python, Hive, Java, Scala, SQL, Sqoop, Maven, Kafka, Spark Streaming, Flume, Hue - Impala, HBase.

Bigdata Developer

Confidential, Boston, MA


  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in loading data from LINUX file system to CDH Hadoop Distributed File System using Sqoop import with different append functionalities
  • Developed data pipeline using Flume, Sqoop to ingest student behavioral data and time taken for preparation and materials studied into HDFS for analysis.
  • Expertise in newer concepts like Apache Spark and Scala programming
  • Used Scala to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed job flows in Oozie to automate the workflow for pig and hive jobs.
  • Loaded the aggregated data onto DB2 from Hadoop environment using Sqoop for reporting on the dashboard.
  • Facilitated Knowledge transfer sessions.
  • Worked in an Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in the daily scrum and other design related meetings.

Environment: Cloud era, Spark, Python, Hive, Java, Scala, SQL, Sqoop, J2EE, Impala, Yarn, Lambda, Tableau, Cloudera, Eclipse, HBase, Agile, Waterfall, Version Control: GIT, SVN.

Hadoop Developer



  • Designed Sqoop jobs to Import the large sets of Structured data from DB2 to HDFS and Exported data for report Analysis.
  • Responsible for the documentation, design, development, and architecture of Hadoop applications
  • Worked on SparkSQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Designing, Building, installing, configuring and supporting Hadoop.
  • Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
  • Built UDF (User Defined Functions) in Pig, Hive when needed and Developing the Pig scripts for processing data.
  • Adopted Oozie Workflow engine to run multiple Hive and Sqoop jobs.
  • Wrote multiple Hive queries for data analysis to meet the business requirements.
  • Experienced in creating hive internal and external tables on top of HDFS.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS and developed Kafka consumer API in Scala for consuming data from Kafka topics.
  • Processed data using SparkSQL in-memory computation & processed results to Hive tables.
  • Load and transform large sets of structured, semi structured and unstructured data. Used AVRO, Parquet file formats for serialization of data.
  • Created graphical reports using tableau tool for the data visualization.
  • Scheduled and maintained several batch jobs to run automatically depending on business requirements.

Environment: Spark, Scala, Sqoop, Python, Bash script, AWS S3, Redshift, GitHub, Hive, Map-Reduce, DB2, EC2, Shell scripting, Oozie, Flume, Java, J2EE, Impala, Yarn, Lambda, Tableau, Cloudera, Eclipse, HBase.

Hire Now