We provide IT Staff Augmentation Services!

Big Data Developer Resume

Bentonville, Ar


  • Around 4 years of IT experience as Spark/Hadoop Developer & Java Developer.
  • 1 - year experience working as a Graduate Research Assistant at Purdue University.
  • Experience working on Hadoop ecosystem, Hive, Pig, Zookeeper, Sqoop, Oozie and Apache Spark.
  • Experienced in developing applications in Scala and Java.
  • Hands-on experience importing and exporting data from RDBMS to HDFS using Sqoop.
  • Experienced in Apache Spark, Spark RDDs, Spark Core, Spark SQL.
  • Hands on knowledge on Core Java, Springboot, Hibernate frameworks.
  • Have familiarity in Hadoop distributions like Cloudera and Hortonworks.
  • Knowledge on spinning up EMR cluster and running Hadoop, Spark jobs.
  • Knowledge on AWS services RDS, EMR, EC2, S3.
  • Knowledge on Kafka, Spark Streaming.
  • Worked in teams following Agile methodologies.


Languages Scala, Python, Java.: Frameworks & Tools: Hadoop, MapReduce, Sqoop, Pig, Hive, Apache Spark, Kafka, Springboot Hibernate, AWS EMR, S3, EC2, RDS.

Big Data environments Cloudera, Hortonworks: Databases: Oracle, MySQL.

IDEs IntelliJ, Eclipse, NetBeans, Anaconda.: Other tools: Microsoft Visio, UML, Git, Github, Pivotal Tracker, Putty.


Confidential, Bentonville, AR.

Big Data Developer


  • Involved in developing end to end Spark analytical applications for business insights.
  • Involved in requirement gathering and designing the solution.
  • Worked on getting structured data from Teradata to Hive, HDFS using TDCH.
  • Worked on validating data after import.
  • Created Oozie workflows to trigger Spark and MLP jobs.
  • Developed Java application to trigger Oozie workflows and monitor the jobs’ status.
  • Implemented Hibernate to log job details in MySQL.
  • Developed Python scripts for coordination with different tools in the organization.
  • Worked on optimizing Hive and Spark scripts.
  • Worked on converting Hive queries to Spark using Python.
  • Worked on Spark optimizations and memory management.
  • Used Git for version control.


Hadoop Developer


  • Determining the viability of a business problem for a Big Data solution.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Imported millions of structured data records from relational databases using Sqoop import to process.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Developed mappings using data processor transformation to load data from word, pdf documents to HDFS.
  • Solved performance issues in Hive scripts with understanding of joins, group and aggregations.
  • Fetching the HQL results into CSV files and handover to reporting team.
  • Collaborated with team engineers to produce high quality code using Agile software development.
  • Worked on POC for streaming data using Kafka and Spark Streaming.
  • Created and maintained technical document of the life cycle to present at closure.
  • Used Git for version control.


Software Developer.


  • Worked on a live 30 node Hadoop cluster running Cloudera.
  • Experienced with through hands-on experience in Hadoop, Java, SQL and Python.
  • Importing and exporting data into HDFS from Oracle database and vice versa using Sqoop.
  • Used Flume to collect, aggregate, and store the web log data from different web servers, network devices and pushed to HDFS.
  • Load and transform large sets of semi-structured and unstructured data that includes Sequence files and XML files and worked on Avro and Parquet file formats using compression techniques like Snappy, Gzip and Zlib.
  • Experienced working on processing unstructured data using Pig and Hive.
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
  • Experienced in writing the Map Reduce programs for analyzing of data as per the business requirements.
  • Experienced in using Hive and Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
  • Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
  • Developed Junit test cases to validate the results of analysis of MapReduce.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in analyzing system failures, identifying root causes, and recommended course of actions.


Java Developer


  • Involved in design, development and analysis documents in sharing with Clients.
  • Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
  • Developed Presentation Layer using HTML, CSS, and JSP and validated the data using JavaScript.
  • Analyzing the Client Requirements and designing the specification document based on the requirements.
  • Applied J2EE design patterns like business delegate, DAO and Singleton.
  • Worked with Maven build tool to build the Project.
  • Written SQL queries, PL/SQL and stored procedures as part of database interaction.
  • Used dispatch action to group related actions into a single class.
  • Testing and production support of core java based multithreading ETL tool for distributed loading XML data into Oracle10g database using JPA/Hibernate.
  • Utilized frameworks such as Hibernate and Spring for persistence and Application Layers.
  • Attached an SMTP server to the system, which handles Dynamic E-Mail Dispatches.
  • Defined and Developed Action and Model Classes.
  • Used Spring Framework and created the Dependency injection for the Action classes using ApplicationContext.xml.
  • Configured and Deployed application on Tomcat Application Server.
  • Used Log4j to implement logging facilities. Used Git for version control.

Hire Now