We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Ridgefield Park, NJ

SUMMARY

  • 6 years of IT experience in software analysis, design, development, testing and implementation of Bigdata, Hadoop, NoSQL and Java/J2EE technologies.
  • Having 3+ years of hands on experience with Bigdata Ecosystems including Hadoop (1.0 and YARN) MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, Zookeeper.
  • Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java.
  • Experience in using Pig as an ETL tool for transformations and pre - aggregations.
  • Experience in importing & exporting data using Sqoop from HDFS to RDBMS and vice-versa.
  • Experience with different data formats like Json, Avro, parquet, RC and ORC formats and compressions like snappy & bzip.
  • Experience with enterprise versions of Cloudera and Hortonworks distributions.
  • Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets APIs.
  • Hands on experience with Real time Streaming using Kafka into HDFS.
  • Hands on experience with Spark Application development using Scala.
  • Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Hands on experience with Amazon Redshift integrating with Spark.
  • Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL.
  • Strong experience with Unit testing and System Testing in Bigdata and Spark technologies.
  • Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC, Restful API and web technologies like HTML, DHTML, XML and Java Script.
  • Experience in using CRON, Shell, and Perl scripting and version control tools like SVN and Github.
  • Experience in SDLC models like Agile SCRUM, Waterfall model under the guidelines of CMMI.

TECHNICAL SKILLS

HADOOP Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Apache Ambari and Cloudera Manager

Spark Components: Spark Core, Spark SQL (Data Frames and Datasets API), Spark Streaming, Scala, Apache Kafka

Cloud Infrastructure: AWS Cloud Formation, Redshift, IAM, EC2-Classic and EC2-VPC

Programming LANGUAGES: C, Java, Scala, Shell, Perl, PLSQL

Databases: Oracle, Teradata, MySQL, HBase, Cassandra

WEB TECHNOLOGIES: HTML, DHTML, CSS, XML, XSLT, Java Script and CSS

Java/J2EE Technologies: Java, J2EE, Servlets, JSP, JDBC, RESTFUL API

Enterprise Frameworks: Spring, Hibernate, Struts, MVC

IDES & Command Line Tools: Eclipse, Net Beans, IntelliJ, Cygwin, mRemoteNG, WinSCP

TESTING & CASE TOOLS: Junit, Rational Clear Case, Log4j, ANT, Maven, SBT, JFrog Artifactory

LEARNING CURICULUM: Apache Flink, Apache Drill

PROFESSIONAL EXPERIENCE

Confidential, Ridgefield Park, NJ

Hadoop/Spark Developer

Responsibilities:

  • Participated in gathering requirements, analyze requirements and design technical documents for business requirements.
  • Worked on large Hadoop cluster with Kerberos environment including KMS and KTS Servers.
  • Load and transform large sets of flat files and semi structured files dat includes xml format.
  • Developed Java Program API for converting semi structured data to csv files and then loading into Hadoop.
  • Orchestrated hundreds of Sqoop queries and Hive queries using Oozie workflows and Coordinators.
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries and partitioning only on Impala tables.
  • Integrated Tableau with Impala and published workbooks from Tableau Desktop to Tableau Server.
  • Publishing Tableau workbook from Multiple Data sources and scheduling automated refreshes on the Tableau Server.
  • Spinning up of Hadoop Cluster in AWS using Cloudera Director.
  • Responsible for handling part of dev operations like daily job monitoring, submitting Cloudera tickets and providing cluster access to the new users.
  • Responsible for granting access roles on the databases and tables using Sentry.
  • Migrated Impala Scripts to Spark SQl scripts using DataFrames and Datasets APIs.
  • Loading data into Hbase using Spark with Cloudera Spark on Hbase module.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Importing the data into Spark from Kafka topics using Spark Streaming APIs.

Environment: AWS, Amazon S3, Impala, Hive, HBase, Spark SQL, Shell, Cloudera Enterprise, Cloudera Director, Cloudera Navigator, Cloudera Manager, Sentry, Jira, SBT and Gitlab

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
  • Performed Sqoop Incremental imports by using Oozie on the basis of every day.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
  • Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
  • Responsible for executing hive queries using Hive Command Line under Tez.
  • Implemented Hive Generic UDF's to implemented business logic around custom data types.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Coordinated the Pig and Hive scripts using Oozie workflow.
  • Loaded the data into HBase from HDFS.
  • Continuous monitoring of Hadoop Cluster using Ambari Metrics.
  • Load and transform large sets of structured, semi structured and unstructured data dat includes Avro, sequence files and xml files.

Environment: Hadoop, Hortonworks, Big Data, HDFS, MapReduce, Tez, Sqoop, Oozie, Pig, Hive, Linux, Java, Eclipse.

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Involved in the design and development of the entire application. Created UML diagrams (use case, class, sequence, and collaboration) based on the business requirements.
  • Designed and developed dynamic Web pages using HTML and JSP.
  • Implemented Object Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
  • Used Spring IOC for injecting the Hibernate and used Hibernate annotations to design the modelling part of the applications.
  • Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
  • Wrote JUnit test cases to test the functionality of each method in the DAO layer. Used CVS for version control. Configured and deployed the WebSphere application Server.
  • Used Log4j for tracking errors and bugs in the project source code.
  • Prepared technical reports and documentation manuals for efficient program development.

Environment: JSP, HTML, Servlets, Hibernate, Spring IOC, Spring Framework, JavaScript, XML, JDBC, Oracle9i, PL/SQL, WebSphere, Eclipse, JUnit, CVS, Log4j

Confidential

Java Developer

Responsibilities:

  • Involved in developing various data flow diagrams, use case diagrams and sequence diagrams.
  • Involved in creating JSP pages and HTML Pages.
  • Used HTTP Filtering in order to perform the filtering task on request and response.
  • Worked extensively in JSP, HTML, JavaScript, and CSS to create the UI pages for the project.
  • Created JUnit test cases for unit testing and developed generic JS functions for validations.

Environment: Java 1.6, JSP, HTML, Eclipse, CSS, JavaScript, PL/SQL, Windows

We'd love your feedback!