Hadoop/spark Developer Resume
San Francisco, CA
SUMMARY:
- 6 years of professional IT experience which includes 3 years of experience in Big data ecosystem and related technologies and 3 years of experience as Core Java Developer.
- Experience in Hadoop Ecosystem providing and implementing solutions for Big Data Applications with good knowledge of Hadoop architecture.
- Good understanding of Hadoop architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and HDFS Framework.
- Good understanding of MapReduce2 with YARN framework such as Resource Manager, Node Manager, Application Master.
- Experience in importing/exporting from RDBMS to HDFS system using Sqoop in different formats and vice versa.
- Ingest real-time and near-real time (NRT) streaming data into HDFS using Flume.
- Experience in implementing Mappers, Reducers, Combiners and partitioners to deliver the best results for the large datasets.
- Experience in analyzing data using HiveQL
- Experience in writing Pig Latin scripts, working with grunt shells.
- Experience in working with Spark RDD, Spark-SQL, Data Frame using Scala.
- Good understanding of different File Formats like Avro, Parquet, Json, ORC, CSV, Sequence File.
- Extensively worked with Cloudera Distribution Hadoop 5.x.
- Good hands on experience on concepts of Core Java.
- Strong work experience on Eclipse, MS SQL 2008 and SQL Queries.
- Well-developed skills in testing, debugging and troubleshooting different types of technical issues.
TECHNICAL SKILLS:
Big Data: Hadoop HDFS, MapReduce2, YARN, Hive, Pig, Flume, Scala, Apache Spark Core, Spark SQL, Sqoop, Impala, Oozie.
File Formats: Text, Sequence, JSON, ORC, AVRO, and Parquet
Database: MySQL, SQL Server 2008 R2Tools: SSMS, Maven
Hadoop Distribution: CDH 5.x
IDE: Eclipse
Programming Language: Java, Scala
Operating Systems: Windows 7/8, Centos
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Hadoop/Spark Developer
Technologies: MapReduce, HDFS, Sqoop, Flume, LINUX, Pig, Hive, Spark Core, Spark SQL, Oozie, Impala
Responsibilities:
- Created Sqoop jobs to import data from Transaction Data mart(oracle) to HDFS, Hive in Text File Format for further processing.
- Involved in collecting and aggregating large amounts of logs into HDFS using Flume.
- Created Sqoop jobs to Export the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- MapReduce Programming/optimization in core java on data imported into HDFS.
- Designed Hive Internal and External tables in ORC format as per business requirements. implemented Hive partitioning and bucketing to improve query performance.
- Implemented techniques for efficient execution of Hive queries like Map Joins, compress map/reduce output, parallel execution of queries.
- Developed PIG Scripts for transactions and web logs Cleansing and used HiveQL for Web logs analysis.
- Worked on developing customized UDF's in java to extend Hive and Pig functionality.
- Using Scala for programming in Spark.
- Working on converting some of the existing Spark applications developed in Spark RDD into Spark DataFrames.
- Working on converting Hive Queries into spark transformations using Spark SQL.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Working on the spark job optimizations.
- Using Oozie Workflow and Coordinator for Scheduling Sqoop, Map Reduce, Pig and Hive actions.
- Cloudera Manger was used to Monitor the Jobs which are running on the CDH cluster.
- Extensively worked with Cloudera Distribution Hadoop 5.x.
Confidential
Core Java/Hadoop Developer
Technologies: Core Java, SQL Server 2008, Postilion Real-time Framework, Python, unit, XML, Perforce, My SQL, HDFS, Map Reduce, Hive, Sqoop
Responsibilities:
- Involved in analysis, design and development of the system components.
- Core Java Developer, worked on Eclipse and SSMS to migrate Java and SQL respectively.
- Used Junit framework and Test Harness for unit testing of Java Components.
- Component development using Postilion Real-time Framework.
- Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
- Wrote Map Reduce jobs to transform large sets of structured, semi-structured and unstructured data.
- Importing and exporting data into HDFS from MySQL and vice versa using Sqoop.
- Designed internal and external Hive tables to load data to and from external tables.
- Worked as a code reviewer to check the design, vulnerability and scalability of the code.
- Worked on Automation to reduce manual efforts.
- Documented Initial Analysis Documents, Functional Specifications and WBS.
- Documented Requirement Traceability Matrix(RTM) for mapping function/code with Test cases.
- Mentored team members by assisting with regular knowledge transfer sessions and training new team member on core java.
Confidential
Core Java Developer
Technologies: Core Java, SQL Server 2008
Responsibilities:
- Involved in different phases of the project life cycle from requirements gathering to testing.
- Involved in developing Data Transformation and Manipulation classes using core java.
- Created Junit test cases and created set up manuals and user guides.