Hadoop/Spark Developer Resume Ridgefield Park, NJ - Hire IT People

SUMMARY

6 years of IT experience in software analysis, design, development, testing and implementation of Bigdata, Hadoop, NoSQL and Java/J2EE technologies.
Having 3+ years of hands on experience with Bigdata Ecosystems including Hadoop (1.0 and YARN) MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, Zookeeper.
Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java.
Experience in using Pig as an ETL tool for transformations and pre - aggregations.
Experience in importing & exporting data using Sqoop from HDFS to RDBMS and vice-versa.
Experience with different data formats like Json, Avro, parquet, RC and ORC formats and compressions like snappy & bzip.
Experience with enterprise versions of Cloudera and Hortonworks distributions.
Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets APIs.
Hands on experience with Real time Streaming using Kafka into HDFS.
Hands on experience with Spark Application development using Scala.
Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
Hands on experience with Amazon Redshift integrating with Spark.
Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL.
Strong experience with Unit testing and System Testing in Bigdata and Spark technologies.
Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC, Restful API and web technologies like HTML, DHTML, XML and Java Script.
Experience in using CRON, Shell, and Perl scripting and version control tools like SVN and Github.
Experience in SDLC models like Agile SCRUM, Waterfall model under the guidelines of CMMI.

TECHNICAL SKILLS

HADOOP Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Apache Ambari and Cloudera Manager

Spark Components: Spark Core, Spark SQL (Data Frames and Datasets API), Spark Streaming, Scala, Apache Kafka

Cloud Infrastructure: AWS Cloud Formation, Redshift, IAM, EC2-Classic and EC2-VPC

Programming LANGUAGES: C, Java, Scala, Shell, Perl, PLSQL

Databases: Oracle, Teradata, MySQL, HBase, Cassandra

WEB TECHNOLOGIES: HTML, DHTML, CSS, XML, XSLT, Java Script and CSS

Java/J2EE Technologies: Java, J2EE, Servlets, JSP, JDBC, RESTFUL API

Enterprise Frameworks: Spring, Hibernate, Struts, MVC

IDES & Command Line Tools: Eclipse, Net Beans, IntelliJ, Cygwin, mRemoteNG, WinSCP

TESTING & CASE TOOLS: Junit, Rational Clear Case, Log4j, ANT, Maven, SBT, JFrog Artifactory

LEARNING CURICULUM: Apache Flink, Apache Drill

PROFESSIONAL EXPERIENCE

Confidential, Ridgefield Park, NJ

Hadoop/Spark Developer

Responsibilities:

Participated in gathering requirements, analyze requirements and design technical documents for business requirements.
Worked on large Hadoop cluster with Kerberos environment including KMS and KTS Servers.
Load and transform large sets of flat files and semi structured files dat includes xml format.
Developed Java Program API for converting semi structured data to csv files and then loading into Hadoop.
Orchestrated hundreds of Sqoop queries and Hive queries using Oozie workflows and Coordinators.
Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries and partitioning only on Impala tables.
Integrated Tableau with Impala and published workbooks from Tableau Desktop to Tableau Server.
Publishing Tableau workbook from Multiple Data sources and scheduling automated refreshes on the Tableau Server.
Spinning up of Hadoop Cluster in AWS using Cloudera Director.
Responsible for handling part of dev operations like daily job monitoring, submitting Cloudera tickets and providing cluster access to the new users.
Responsible for granting access roles on the databases and tables using Sentry.
Migrated Impala Scripts to Spark SQl scripts using DataFrames and Datasets APIs.
Loading data into Hbase using Spark with Cloudera Spark on Hbase module.
Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
Importing the data into Spark from Kafka topics using Spark Streaming APIs.

Environment: AWS, Amazon S3, Impala, Hive, HBase, Spark SQL, Shell, Cloudera Enterprise, Cloudera Director, Cloudera Navigator, Cloudera Manager, Sentry, Jira, SBT and Gitlab

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
Performed Sqoop Incremental imports by using Oozie on the basis of every day.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
Responsible for executing hive queries using Hive Command Line under Tez.
Implemented Hive Generic UDF's to implemented business logic around custom data types.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Coordinated the Pig and Hive scripts using Oozie workflow.
Loaded the data into HBase from HDFS.
Continuous monitoring of Hadoop Cluster using Ambari Metrics.
Load and transform large sets of structured, semi structured and unstructured data dat includes Avro, sequence files and xml files.

Environment: Hadoop, Hortonworks, Big Data, HDFS, MapReduce, Tez, Sqoop, Oozie, Pig, Hive, Linux, Java, Eclipse.

Confidential

JAVA/J2EE Developer

Responsibilities:

Involved in the design and development of the entire application. Created UML diagrams (use case, class, sequence, and collaboration) based on the business requirements.
Designed and developed dynamic Web pages using HTML and JSP.
Implemented Object Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
Used Spring IOC for injecting the Hibernate and used Hibernate annotations to design the modelling part of the applications.
Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
Wrote JUnit test cases to test the functionality of each method in the DAO layer. Used CVS for version control. Configured and deployed the WebSphere application Server.
Used Log4j for tracking errors and bugs in the project source code.
Prepared technical reports and documentation manuals for efficient program development.

Environment: JSP, HTML, Servlets, Hibernate, Spring IOC, Spring Framework, JavaScript, XML, JDBC, Oracle9i, PL/SQL, WebSphere, Eclipse, JUnit, CVS, Log4j

Confidential

Java Developer

Responsibilities:

Involved in developing various data flow diagrams, use case diagrams and sequence diagrams.
Involved in creating JSP pages and HTML Pages.
Used HTTP Filtering in order to perform the filtering task on request and response.
Worked extensively in JSP, HTML, JavaScript, and CSS to create the UI pages for the project.
Created JUnit test cases for unit testing and developed generic JS functions for validations.

Environment: Java 1.6, JSP, HTML, Eclipse, CSS, JavaScript, PL/SQL, Windows

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Ridgefield Park, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship