We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

NJ

SUMMARY:

  • Around 5 years of experience in software industry with 3 years of experience in Hadoop Eco System, Worked in Agile Environments.
  • Strong Experience in distinct phases of software development life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
  • In depth understanding/knowledge of Hadoop architecture and its components such as HDFS, job tracker, Task Tracker, Name node, Data Node and MapReduce.
  • Experience in Deploying and Managing multi node clusters with different Hadoop components (HDFS, YARN, HIVE, SQOOP, OOZIE, FLUME, ZOOKEEPER, SPARK, IMPALA) using Cloudera Manager and Horntonworks Ambari.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Very good at loading data into spark schema RDD’s and querying them using Spark - SQL
  • Experience in writing MapReduce programs from Scratch according to the requirement.
  • Experience in writing joins and sorting algorithms in MapReduce using java.
  • Expertise in writing Hadoop jobs for analyzing data using MapReduce, Hive and pig.
  • Familiar with importing and exporting data using Sqoop into HDFS and Hive.
  • Experience in using flume and Knowledge in Kafka to ingest the data from web servers into HDFS.
  • Have good knowledge on Apache Storm.
  • Hands on Experience in Extending pig and hive core functionality by writing custom UDF’s.
  • Experience in handling different file formats like parquet, apache Avro, sequence file, JSON, Spreadsheets, Text files, XML and Flat File Format.
  • Have good knowledge on NoSQL databases like HBase, Cassandra.
  • Have good Knowledge on BI tools like Tableau and ETL Tools like Talend, Informatica.
  • Basic knowledge on Machine learning and predictive Analysis.
  • Hands on experience in Application development using Java, RDBMS and Linux Shell Scripting.
  • Good experience in using Relational databases Oracle, SQL Server.
  • Good Working Knowledge of Amazon Web Service Components like EC2, EMR, S3, Elasticsearch.

TECHNICAL SKILLS:

Programming languages: C, Java, Scala

Web Languages: HTML, CSS

Framework: Hadoop, Map Reduce, Hive, Pig, Spark, Kafka

J2EE technologies: JDBC, Servlets, JSP

Database: Oracle DB, SQL Server, HBase, MongoDb, Cassandra

Operating Systems: Windows, Linux, Centos, Macintosh

Tools: /IDE’s: Sqoop, Flume, Oozie, NetBeans, Eclipse

PROFESIONAL EXPERIENCE:

Confidential, NJ

Hadoop / Spark Developer

Responsibilities:

  • Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on Terabytes of customer usage data on daily basis.
  • Involved in creating end to end Spark applications for various data transformation activities.
  • Performed series of ingestion jobs using Sqoop, Kafka and custom Input adapter to move data from various sources to HDFS.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Created Spark jobs to see trends in data usage by users.
  • Real time streaming the data using Spark with Kafka.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Configured Kafka to read and write messages from external programs.
  • Converted Hive queries into Spark transformations using Spark RDDs.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Imported the data from different data sources into HDFS using Sqoop by making the required transformations using Hive.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Scheduled and executed workflows in Oozie to run Hive and Spark jobs.
  • Used to monitor and manage the Hadoop cluster using Cloudera Manager.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.

Environment: Hadoop Distribution of Cloudera, AWS Service (Clusters on cloud), HDFS, Map Reduce, Sqoop, Kafka, Spark, Spark SQL, Hive, Cassandra, LINUX, Java, Scala, Eclipse, Oracle, Tableau, UNIX Shell Scripting, Putty.

Confidential, NC

Hadoop Developer

Responsibilities:

  • Gathering data requirements and identifying sources for acquisition.
  • Development and ETL Design in Hadoop
  • Developed MapReduce input format to read specific data format.
  • Developed Hive quires and UDF’s as per requirement.
  • Involved in extracting customer’s big data from various data sources into Hadoop, This included data from mainframes, databases and also logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers.
  • Developed MapReduce programs to cleanse the data in Hdfs obtained from multiple sources to make it suitable for ingestion into hive schema for analysis.
  • Implemented Partitioning, Bucketing in hive for better organization of the data.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java MapReduce, Hive and Sqoop as well as system specific jobs
  • Used to monitor and debug Hadoop jobs/applications running in production.
  • Used Solr for Searching.
  • Worked on Cloudera upgrade from CDH to CDH.x.
  • Worked on providing user support and application support on Hadoop infrastructure.
  • Worked on evaluating, comparing different tools for test data management with Hadoop
  • Helping testing team on Hadoop Application Testing.

Environment: Hadoopv1.2.1, HDFS, Map Reduce, Hive, Sqoop, Pig, Oracle, XML, CDH4.x, Zookeeper, Oozie

Confidential

Big Data Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Coordinated with business customers to gather business requirements. And also, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in Design phase and delivered Design documents.
  • Set up 3 node Hadoop clusters with IBM Big Insights.
  • Worked with highly unstructured and semi structured data.
  • Extracted the data from Oracle into HDFS using Sqoop (version 1.4.3) to store and generate reports for visualization purpose.
  • Leveraged Solr API to search user interaction data for relevant matches.
  • Designed the Solr Schema, and used the SolrJ client api for storing, indexing, querying the schema fields
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
  • Extensive experience in writing Pig (version 0.12.0) scripts to transform raw data into baseline data.
  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Worked on Oozie workflow engine for job scheduling.
  • Created Hive tables, partitions and loaded the data to analyze using HiveQl queries
  • Loading the data to HBASE by using bulk load and HBASE API

Environment: IBM Big Insights 2.1.2, Java, Hive, Pig, HBase, Sqoop, Flume, Oozie, Solr, Shell script.

Jr. Software Engineer

Confidential

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and creating use Cases.
  • Coordination with the Design team, Business analysts and endusers of the system.
  • Designed and developed front-end using JSP, JavaScript, and html.
  • Programming using core java language.
  • Worked with Solr for indexing the data and used JSP for the Web application.
  • Used JAXP (DOM, XSLT), XSD for XML data generation and presentation.
  • Wrote JUnit test classes for the services and prepared documentation.
  • Support and Bug fixing.

Environment: Java, JDBC, JSP, Servlets, HTML, JUnit, Java APIs, Design Patterns, MySQL, Eclipse IDE.

Hire Now