We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Overall 8+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies.
  • Having 3+ years of hands on experience with Big Data Ecosystems including Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper in a range of industries.
  • Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
  • Experience in writing Hive Queries for processing and analyzing large volumes of data.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
  • Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.
  • Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.
  • Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Partitioner to speed up the jobs.
  • Used Hbase in accordance with Hive as and when required for real time low latency queries.
  • Acute knowledge on Spark architecture and real-time streaming using Spark.
  • Extensively used Spark SQL, Pyspark & Scala API for querying and transformation of data residing in Hive.
  • Good knowledge on Amazon Web Services(AWS) cloud services like EC2, S3, EBS, RDS and VPC.

TECHNICAL SKILLS

Hadoop Eco-System: Hadoop, Mapreduce, HDFS, Kafka, Hive, Pig, Sqoop,Impala, Oozie, Flume, Yarn, Zookeeper,Hbase.

Spark components: Spark, Spark SQL, Spark Streaming, Python.

AWS Cloud Services: S3, EBS, EC2, VPC, Redshift, EMR

Programming Languages: Java, Scala, SQL, Shell scripting

Databases: Hbase, Oracle, DB2, MySQL, SQLite, MS SQL Server.

Development Processes: AGILE, Scrum

Big data Platforms: Cloudera

PROFESSIONAL EXPERIENCE

Confidential, Nashville,TN

Spark/Hadoop Developer

Responsibilities:

  • Interacted with multiple teams in understanding their business requirements for designing flexible and common component.
  • Used Sqoop for importing and exporting data from sources into HDFS and Hive.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
  • Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
  • Used the optimization techniques including partitioning and bucketing in Hive to enable query the data more efficiently.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming.
  • Conceived and designed custom POCs using Kafka 0.10 and the Spark Streaming in standalone mode.
  • Automated the jobs with Oozie and scheduled them with Autosys.
  • Participated in evaluation and selection of new technologies to support system efficiency.

Environment: Hadoop, HDFS, Hive,Hbase, Spark, Autosys, Kafka, Sqoop, Java, Scala, Eclipse,, Teradata, UNIX, and Maven.

Confidential, St Louis, Missouri

Hadoop Developer

Responsibilities:

  • Developed Spark Programs for Batch processing.
  • Developed Spark scripts by using Java, and Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading data and writing Hive queries.
  • Imported and exported data into HDFS using Sqoop which includes incremental loading.
  • Experienced in defining job flows managing and reviewing Hadoop log files.
  • Responsible in managing data coming from different sources.
  • Supported MapReduce Programs those are running on the cluster.
  • Jobs management using Fair scheduler and Cluster coordination services through Zoo Keeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Hands on Experience in Oozie Job Scheduling.
  • Worked closely with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala,Spark, Hortonworks, Hbase, Amazon EMR, EC2, S3.

Confidential

Hadoop Developer

Responsibilities:

  • Part of team for developing and writing PIG scripts.
  • Loaded the data from RDBMS SERVER to Hive using Sqoop.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database.
  • Developed Java Mapper and Reducer programs for complex business requirements.
  • Developed Java custom record reader, partitioner and serialization techniques.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Performed complex HiveQL queries on Hive tables and Created custom user defined functions in Hive.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Performed SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
  • Performed incremental data movement to Hadoop using Sqoop.
  • Scheduled mapreduce jobs in production environment using Oozie scheduler.
  • Analyzed the Hadoop logs using PIG scripts to oversee the errors caused by the team.
  • Experience in gathering requirements from the client, giving estimates for developing projects and delivering the projects in time.

Environment: Java, Hadoop, MapReduce, HDFS, Pig, Hive,Spark, Scala, Hortonworks, Hbase.

Confidential

Java Developer

Responsibilities:

  • Involved in Analysis, Design, Development and Testing of the application.
  • Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
  • Enhanced the Port search functionality by adding a VPN Extension Tab.
  • Created end to end functionality for view and edit of VPN Extension details.
  • Used AGILE process to develop the application as the process allows faster development as compared to RUP.
  • Used Struts MVC framework and WebLogic Application Server in this application.
  • Involved in creating DAO’s and used Hibernate for ORM mapping.
  • Implemented using Spring Framework for rapid development and ease of maintenance.
  • Written procedures, and triggers for validating the consistency of Metadata.
  • Used Cursors both implicit and explicit to capture many rows within a PL/SQL block.
  • Designed and developed business logic for creating hourly, daily, weekly, monthly, quarterly and yearly summary on balance sheet data (Records) using Oracle PL/SQL programs.
  • Implemented various PL/SQL objects like table, views, procedures, packages, triggers, functions, materialized views, global temporary tables, cursors, Bulk collect, collections, bind variables, ref cursors, sequences, synonyms, indexes etc
  • Written SQL code blocks using cursors for shifting records from various tables based on checks.
  • Fixed defects and generated input XML’s to run on SOA Client to generate output XML for testing Web services.

Environment: JAVA, JSP, servlets, J2EE, EJB, Struts Framework, JDBC, Oracle 10g, OLAP/OLTP, Toad, Windows XP, SQL*Loader, SQL Developer, Agile,Unix, Web Services, CVS, Eclipse.

We'd love your feedback!