We provide IT Staff Augmentation Services!

Spark And Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Malvern, PA

PROFESSIONAL SUMMARY:

  • 6+ Years of professional experience in IT which includes around 4 years of comprehensive experience as Hadoop and Spark Developer, and related technologies.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Map Reduce, Spark and Spark SQL
  • Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
  • Knowledge in handling Kafka cluster and created several topologies to support real - time processing requirements.
  • Experience of converting various File Formats queries into Spark transformations using Data Frames and Datasets.
  • Experience of developing SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming.
  • Exposure to Spark, Spark Streaming, Scala and Creating the Data Frames handled in Spark with Scala.
  • In-depth understanding of Spark Architecture including Spark SQL , Data Frames , Spark Streaming .
  • Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka .
  • Implemented Sqoop for large dataset transfer between Hadoop and RDBMS.
  • Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Worked with AWS to migrate the entire Data Centers to the cloud using EC2, S3, and EMR.
  • Experience in importing and exporting Multi Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
  • Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
  • Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Good Experience working with Amazon AWS for setting up Hadoop cluster.
  • Hand on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper and Flume.
  • Worked on custom Spark Transformations and variety of data formats such as JSON, Compressed CSV, ORC, AVRO etc. and reading data from various sources like HBase and Hive.
  • Performed map-side joins on RDD .
  • Good understanding of Amazon web services like Elastic MapReduce (EMR), EC2.
  • Experience in ETL operations on Hive to Spark.
  • Gained hands on experience in writing shell scripts in UNIX.
  • Experienced with processing different file formats like Avro, XML, JSON and Sequence file formats using Spark
  • Experience in implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
  • Good understanding in configuring simple to complex work flows using Oozie.
  • Good understanding of NoSQL databases like MongoDB and HBase.
  • Worked on different operating systems like UNIX/Linux, Windows.
  • Worked in managing VMs in Cloudera and Horton Works.
  • Experience as a java Developer in client/server technologies using JSP , JDBC and SQL .

TECHNICAL SKILLS:

Big data/Hadoop Ecosystem: HDFS, Map Reduce, Hive, HBase, Sqoop, Flume, Oozie, Spark, Hue, Impala, Kafka, Spark Data Frames, Spark SQL.

Hadoop Distribution Platforms: Cloudera(CDH4/CDH5), Horton works.

Programming Languages: Core Java, Scala, SQL, Linux.

Application Servers: Tomcat, WebLogic

MySQL, Oracle, No: SQL Database(HBase)

IDE Tools: Eclipse, IntelliJ, Putty, MobaXterm, Tableau.

Operating System: Centos, Windows (7,8,10), Ubuntu, UNIX

PROFESSIONAL EXPERIENCE:

Confidential, Malvern, PA

Spark and Hadoop Developer

Responsibilities:

  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Experienced with Kafka to ingest data into Spark Engine
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's, Data frames, Spark SQL and Scala
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Scala and Spark SQL code to extract data from various databases.
  • Used Spark SQL to process the huge amount of structured data and Implemented Spark RDD transformations, actions to migrate Map reduce algorithms
  • Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Used Kafka extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Developed robust set of codes that are tested, automated, structured and efficient.
  • Performed map-side joins on RDD, Spark SQL and Data Frames.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop.
  • Implemented the data backup strategies for the data in the HDFS cluster.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing. .
  • Imported the data from relational databases into HDFS using Sqoop.
  • Understanding of data storage and retrieval techniques, ETL, and databases, relational databases, tuple stores, Hadoop, HUE, MySQL and Oracle databases.

Environment: Apache Spark, Cloudera (CDH 5.12), Scala, Spark SQL, Data Frames, Kafka, SBT build, HUE, Sqoop, Zookeeper, HDFS .

Confidential, Littleton, CO

Hadoop Developer

Responsibilities:

  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
  • Developed analytical component using Scala and Spark
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Real streaming the data using Spark with Flume and store the stream data to HDFS using Scala .
  • Installing configuring, troubleshooting of Hadoop with Cloudera Ecosystem, Configuring &Troubleshooting job tracker Nodes, task tracker (data) nodes, MapReduce, Spark, Hive, HBase, Sqoop and Oozie.
  • Involved in complete Implementation lifecycle, specialized in writing custom Map Reduce, Pig and Hive programs.
  • Extensively used Sqoop to get data from RDBMS sources like Teradata.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Used Cloudera Manager to monitor and manage Hadoop Cluster.
  • Deployed a multi-node Hadoop cluster
  • Worked extensively with SQOOP for importing metadata from Oracle.
  • Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Created HBase tables to store various data formats of data coming from various sources.
  • Used Impala to query the Hadoop data stored in HDFS.
  • Worked on streaming log data into HDFS from web servers using Flume .
  • Was responsible for importing the data (Log files) from various sources into HDFS using Flume.
  • Creating Hive tables on JSON data, Creating Hive tables on JSON data, Run the Hive queries in Hue and CLI.
  • Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
  • Migrated ETL jobs to Spark to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implement Flume, Spark, Stream framework for real time data processing.

Environment: UNIX, HDFS, Hive, Spark, Scala, Flume, Sqoop, HBase, Zookeeper, CDH 5.4.

Confidential, Burns Harbor, IN

Hadoop Developer

Responsibilities:

  • Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Handle the data exchange between HDFS and RDBMS using Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop Cloudera Distribution.
  • Developed several advanced Map Reduce programs to process data files received.
  • Developed Hive Scripts, Hive UDFs to load data files into Hadoop.
  • Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
  • Import the data from various sources like HDFS/HBase into Hive.
  • Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network Devices and pushed to HDFS.
  • Created tables in Impala, Created Partition tables, run the hive queries in hue, created tables HBase, Created folders in HDFS.
  • Importing and Exporting Data from MySQL/Oracle to HiveQL.
  • Importing and Exporting Data from MySQL/Oracle to HDFS.
  • Write the Impala queries.
  • Installed Oozie workflow engine to run multiple Hive jobs.

Environment: CDH5.9, HDFS, Hive, hue, Sqoop, Impala, HBase, Oozie, Hive, Flume, Linux)

Confidential,

Java developer

Responsibilities:

  • Involved in Analysis, Design, Development, Integration and Testing of application modules.
  • Involved in developing a custom framework like Spring Framework, with more features to meet the business needs.
  • Performed requirement analysis, design, coding and implementation, team co-ordination, code review, testing, and Installation.
  • Developed server side utilities using JAVA technologies Servlets, JSP.
  • Developed presentation layers using JSP custom tags and JavaScript.
  • Implemented design patterns - Business Delegate, Singleton, Flow Controller, DAO and Value Object patterns.
  • Developed Role Based Access Control to restrict the users to access specific modules based on their roles.
  • Used Oracle as the back-end application and used Hibernate Framework for ORM.
  • Deployed the application on WebSphere server using Eclipse as the IDE.
  • Used Tomcat server 5.5 and configured it with Eclipse IDE.
  • Performed extensive Unit Testing for the application.
  • Responsible for Design and development of Web pages using HTML, CSS including Ajax controls and XML .

Environment: WebSphere 5.1, Tomcat 5.0, Oracle 9i, Hibernate3.0, Eclipse 3.2, JSP, Java Script, Servlets, XML, Eclipse, Junit, Spring, plug-ins (Tomcat).

Confidential

Jr Java Developer

Responsibilities:

  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server .
  • Designed tables and indexes.
  • Wrote complex SQL and stored procedures .
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Developed user and technical documentation .

Environment: Java, JSP, Servlets, JDBC, HTML, JavaScript, MySQL, JUnit, Eclipse IDE.

We'd love your feedback!