Sr. Hadoop / Spark Developer Resume

SUMMARY

7 years of extensive experience as IT professional in both technical and project management roles.
3+ years experience in Hadoop/Spark platform as Developer in developing codes and modules to address customer needs using Hive, Sqoop, Oozie and various Hadoop components.
Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
Experience in data analysis using HiveQL, Pig Latin & HBase
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Preparation of Standard Code guidelines, analysis and testing documentations.

TECHNICAL SKILLS

Big Data/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie.

NO SQL Databases: HBase

Languages: Java, Scala

Cluster Management: Ambari

3rd Party Tools: SQL Developer, Toad, Source Tree, Git, Altova XMLSpy, WinSCP, Putty, Hue

Databases: DB2, Oracle, Mysql

Operating Systems: UNIX, Windows, LINUX

PROFESSIONAL EXPERIENCE

Confidential

Sr. Hadoop / Spark Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Hadoop Developer

Confidential

Responsibilities:

Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
Consumed the data from Kafka queue using spark.
Configured different topologies for spark cluster and deployed them on regular basis.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in loading data from LINUX file system to HDFS
Importing and exporting data into HDFS and Hive using Sqoop
Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
Actively involved in code review and bug fixing for improving the performance.
Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Processed the raw data using Hive jobs and scheduling them in Oozie
Good Experience with apache storm using HortonWorks cluster.
Created HBase tables to store various data formats of incoming data from different portfolios.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Environment: Hadoop, HDFS, Pig, Hive, Sqoop, Kafka, Apache Spark, Shell Scripting, HBase, Kerberos, Zoo Keeper, Ambari, Horton Works, MySQL.

Mainframe Developer

Confidential

Responsibilities:

Improved user satisfaction and adoption rates by designing, coding, debugging, documenting, maintaining and modifying a number of apps and programs for online banking. Participated in Hadoop training and development as part of a cross-training program.
Worked on different Mainframe related technologies such as COBOL, JCL, DB2 & CICS.
Led the new banking application that is used to create new set of contracts in the banking system.
Prepared use cases, designed and developed object models and class diagrams.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship