We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

0/5 (Submit Your Rating)

Mclean, VA

SUMMARY

  • Around 5+ years of experience in software industry with 3 years of experience in Hadoop Eco System, Worked in Agile Environments.
  • Strong Experience in distinct phases of software development life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
  • Experience in Deploying and Managing multi node clusters with different Hadoop components (HDFS, YARN, HIVE, SQOOP, OOZIE, FLUME, ZOOKEEPER, SPARK, IMPALA) using Cloudera Manager and Horntonworks Ambari.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Very good at loading data into spark schema RDD’s and querying them using Spark - SQL
  • Experience in writing MapReduce programs from Scratch according to the requirement.
  • Experience in writing joins and sorting algorithms in MapReduce using java.
  • Expertise in writing Hadoop jobs for analyzing data using MapReduce, Hive and pig.
  • Familiar with importing and exporting data using Sqoop into HDFS and Hive.
  • Experience in using flume and Knowledge in Kafka to ingest the data from web servers into HDFS.
  • Hands on Experience in Extending pig and hive core functionality by writing custom UDF’s.
  • Experience in handling different file formats like parquet, apache Avro, sequence file, JSON, Spreadsheets, Text files, XML and Flat File Format.
  • Have good knowledge on NoSQL databases like HBase, Cassandra.
  • Have good Knowledge on BI tools like Tableau and ETL Tools like Talend, Informatica.
  • Basic knowledge on Machine learning and predictive Analysis.
  • Hands on experience in Application development using Java, Python, RDBMS and Linux Shell Scripting.
  • Good experience in using Relational databases Oracle, SQL Server.
  • Good Working Knowledge of Amazon Web Service Components like EC2, EMR, S3.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka

Hadoop Platforms: Cloudera, Hortonworks, AWS

Programming Languages: C, Java, Python, Scala, Spark

Web Languages: HTML, CSS

Framework: Hadoop, Map Reduce, Hive, Pig, Spark, Kafka

J2EE Technologies: JDBC, Servlets, JSP

Database: Oracle DB, SQL Server, HBase, Cassandra

Operating Systems: Windows, Linux, Centos, Macintosh

Tools: /IDE’s: Sqoop, Flume, Oozie, NetBeans, Eclipse

Streaming Technologies: Spark Streaming

PROFESSIONAL EXPERIENCE

Confidential, Mclean VA

Hadoop / Spark Developer

Responsibilities:

  • Designed and implemented data pipelines consists of launching several Hive, Spark equipped EMR clusters that reads the datasets from various data sources and perform analytics, transformations and finally store results to application.
  • Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate transform using spark and load into S3.
  • Responsible for providing SQL Engine over the data lake S3 by adapting Parquet storage format with SparkSql as SQL engine.
  • Migrated historical data to S3 and developed a reliable mechanism for processing the incremental updates.
  • Written framework in spark and Scala to validate the syntax, correctness and data quality to ensure schema hasn't changed and data missed.
  • Optimized the EMR workloads for different type of data loads by choosing right compression, cluster type, instance type storage type and EMRFS in order to analyze data with low cost and high scalability.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark-Context, Spark-SQL, Data Frame and PairRDD's.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: AWS,S3,Sqoop, Kafka, Spark, Spark SQL, Hive, LINUX, Java, Scala, IntelliJ, UNIX Shell Scripting, Python.

Confidential, CA

Hadoop Developer

Responsibilities:

  • Gathering data requirements and identifying sources for acquisition.
  • Development and ETL Design in Hadoop
  • Developed MapReduce input format to read specific data format.
  • Developed Hive quires, Python script and UDF’s as per requirement.
  • Involved in extracting customer’s big data from various data sources into Hadoop, this included data from mainframes, databases and also logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers.
  • Developed MapReduce programs to cleanse the data in Hdfs obtained from multiple sources to make it suitable for ingestion into hive schema for analysis.
  • Implemented Partitioning, Bucketing in hive for better organization of the data.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java MapReduce, Hive and Sqoop as well as system specific jobs
  • Used to monitor and debug Hadoop jobs/applications running in production.
  • Worked on providing user support and application support on Hadoop infrastructure.
  • Worked on evaluating, comparing different tools for test data management with Hadoop
  • Helping testing team on Hadoop Application Testing.

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Pig, Oracle, CDH, Oozie

Confidential

Big Data Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Coordinated with business customers to gather business requirements. And also, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in Design phase and delivered Design documents.
  • Set up 3 node Hadoop clusters with IBM Big Insights.
  • Worked with highly unstructured and semi structured data.
  • Extracted the data from Oracle into HDFS using Sqoop (version 1.4.3) to store and generate reports for visualization purpose.
  • Designed the Solr Schema, and used the SolrJ client api for storing, indexing, querying the schema fields.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
  • Extensive experience in writing Pig scripts to transform raw data into baseline data.
  • Developed Python & Hive scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Worked on Oozie workflow engine for job scheduling.
  • Created Hive tables, partitions and loaded the data to analyze using Python and HiveQL queries
  • Loading the data to HBASE by using bulk load and HBASE API

Environment: CDH, Java, Hive, Pig, HBase, Sqoop, Flume, Oozie, Solr, Shell script.

Confidential

Java Developer

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and creating use Cases.
  • Coordination with the Design team, Business analysts and end users of the system.
  • Programmed using core java language.
  • Extensively Used Java Collections Framework.
  • Used JAXP (DOM, XSLT), XSD for XML data generation and presentation.
  • Wrote JUnit test classes for the services and prepared documentation.
  • Support and Bug fixing.

Environment: Java, JDBC, JSP, Servlets, HTML, JUnit, APIs, Design Patterns, MySQL, Eclipse IDE.

We'd love your feedback!