Hadoop/spark Developer Resume
Ridgefield Park, NJ
SUMMARY
- 6 years of IT experience in software analysis, design, development, testing and implementation of Bigdata, Hadoop, NoSQL and Java/J2EE technologies.
- Having 3+ years of hands on experience with Bigdata Ecosystems including Hadoop (1.0 and YARN) MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, Zookeeper.
- Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java.
- Experience in using Pig as an ETL tool for transformations and pre - aggregations.
- Experience in importing & exporting data using Sqoop from HDFS to RDBMS and vice-versa.
- Experience with different data formats like Json, Avro, parquet, RC and ORC formats and compressions like snappy & bzip.
- Experience with enterprise versions of Cloudera and Hortonworks distributions.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets APIs.
- Hands on experience with Real time Streaming using Kafka into HDFS.
- Hands on experience with Spark Application development using Scala.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Hands on experience with Amazon Redshift integrating with Spark.
- Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL.
- Strong experience with Unit testing and System Testing in Bigdata and Spark technologies.
- Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC, Restful API and web technologies like HTML, DHTML, XML and Java Script.
- Experience in using CRON, Shell, and Perl scripting and version control tools like SVN and Github.
- Experience in SDLC models like Agile SCRUM, Waterfall model under the guidelines of CMMI.
TECHNICAL SKILLS
HADOOP Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Apache Ambari and Cloudera Manager
Spark Components: Spark Core, Spark SQL (Data Frames and Datasets API), Spark Streaming, Scala, Apache Kafka
Cloud Infrastructure: AWS Cloud Formation, Redshift, IAM, EC2-Classic and EC2-VPC
Programming LANGUAGES: C, Java, Scala, Shell, Perl, PLSQL
Databases: Oracle, Teradata, MySQL, HBase, Cassandra
WEB TECHNOLOGIES: HTML, DHTML, CSS, XML, XSLT, Java Script and CSS
Java/J2EE Technologies: Java, J2EE, Servlets, JSP, JDBC, RESTFUL API
Enterprise Frameworks: Spring, Hibernate, Struts, MVC
IDES & Command Line Tools: Eclipse, Net Beans, IntelliJ, Cygwin, mRemoteNG, WinSCP
TESTING & CASE TOOLS: Junit, Rational Clear Case, Log4j, ANT, Maven, SBT, JFrog Artifactory
LEARNING CURICULUM: Apache Flink, Apache Drill
PROFESSIONAL EXPERIENCE
Confidential, Ridgefield Park, NJ
Hadoop/Spark Developer
Responsibilities:
- Participated in gathering requirements, analyze requirements and design technical documents for business requirements.
- Worked on large Hadoop cluster with Kerberos environment including KMS and KTS Servers.
- Load and transform large sets of flat files and semi structured files dat includes xml format.
- Developed Java Program API for converting semi structured data to csv files and then loading into Hadoop.
- Orchestrated hundreds of Sqoop queries and Hive queries using Oozie workflows and Coordinators.
- Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries and partitioning only on Impala tables.
- Integrated Tableau with Impala and published workbooks from Tableau Desktop to Tableau Server.
- Publishing Tableau workbook from Multiple Data sources and scheduling automated refreshes on the Tableau Server.
- Spinning up of Hadoop Cluster in AWS using Cloudera Director.
- Responsible for handling part of dev operations like daily job monitoring, submitting Cloudera tickets and providing cluster access to the new users.
- Responsible for granting access roles on the databases and tables using Sentry.
- Migrated Impala Scripts to Spark SQl scripts using DataFrames and Datasets APIs.
- Loading data into Hbase using Spark with Cloudera Spark on Hbase module.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Importing the data into Spark from Kafka topics using Spark Streaming APIs.
Environment: AWS, Amazon S3, Impala, Hive, HBase, Spark SQL, Shell, Cloudera Enterprise, Cloudera Director, Cloudera Navigator, Cloudera Manager, Sentry, Jira, SBT and Gitlab
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
- Performed Sqoop Incremental imports by using Oozie on the basis of every day.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
- Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
- Responsible for executing hive queries using Hive Command Line under Tez.
- Implemented Hive Generic UDF's to implemented business logic around custom data types.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Coordinated the Pig and Hive scripts using Oozie workflow.
- Loaded the data into HBase from HDFS.
- Continuous monitoring of Hadoop Cluster using Ambari Metrics.
- Load and transform large sets of structured, semi structured and unstructured data dat includes Avro, sequence files and xml files.
Environment: Hadoop, Hortonworks, Big Data, HDFS, MapReduce, Tez, Sqoop, Oozie, Pig, Hive, Linux, Java, Eclipse.
Confidential
JAVA/J2EE Developer
Responsibilities:
- Involved in the design and development of the entire application. Created UML diagrams (use case, class, sequence, and collaboration) based on the business requirements.
- Designed and developed dynamic Web pages using HTML and JSP.
- Implemented Object Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
- Used Spring IOC for injecting the Hibernate and used Hibernate annotations to design the modelling part of the applications.
- Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
- Wrote JUnit test cases to test the functionality of each method in the DAO layer. Used CVS for version control. Configured and deployed the WebSphere application Server.
- Used Log4j for tracking errors and bugs in the project source code.
- Prepared technical reports and documentation manuals for efficient program development.
Environment: JSP, HTML, Servlets, Hibernate, Spring IOC, Spring Framework, JavaScript, XML, JDBC, Oracle9i, PL/SQL, WebSphere, Eclipse, JUnit, CVS, Log4j
Confidential
Java Developer
Responsibilities:
- Involved in developing various data flow diagrams, use case diagrams and sequence diagrams.
- Involved in creating JSP pages and HTML Pages.
- Used HTTP Filtering in order to perform the filtering task on request and response.
- Worked extensively in JSP, HTML, JavaScript, and CSS to create the UI pages for the project.
- Created JUnit test cases for unit testing and developed generic JS functions for validations.
Environment: Java 1.6, JSP, HTML, Eclipse, CSS, JavaScript, PL/SQL, Windows