Sr. Hadoop / Spark Developer Resume
2.00/5 (Submit Your Rating)
SUMMARY
- 7 years of extensive experience as IT professional in both technical and project management roles.
- 3+ years experience in Hadoop/Spark platform as Developer in developing codes and modules to address customer needs using Hive, Sqoop, Oozie and various Hadoop components.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
- Experience in data analysis using HiveQL, Pig Latin & HBase
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Preparation of Standard Code guidelines, analysis and testing documentations.
TECHNICAL SKILLS
Big Data/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie.
NO SQL Databases: HBase
Languages: Java, Scala
Cluster Management: Ambari
3rd Party Tools: SQL Developer, Toad, Source Tree, Git, Altova XMLSpy, WinSCP, Putty, Hue
Databases: DB2, Oracle, Mysql
Operating Systems: UNIX, Windows, LINUX
PROFESSIONAL EXPERIENCE
Confidential
Sr. Hadoop / Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Hadoop Developer
Confidential
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Consumed the data from Kafka queue using spark.
- Configured different topologies for spark cluster and deployed them on regular basis.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
- Actively involved in code review and bug fixing for improving the performance.
- Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Processed the raw data using Hive jobs and scheduling them in Oozie
- Good Experience with apache storm using HortonWorks cluster.
- Created HBase tables to store various data formats of incoming data from different portfolios.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs
Environment: Hadoop, HDFS, Pig, Hive, Sqoop, Kafka, Apache Spark, Shell Scripting, HBase, Kerberos, Zoo Keeper, Ambari, Horton Works, MySQL.
Mainframe Developer
Confidential
Responsibilities:
- Improved user satisfaction and adoption rates by designing, coding, debugging, documenting, maintaining and modifying a number of apps and programs for online banking. Participated in Hadoop training and development as part of a cross-training program.
- Worked on different Mainframe related technologies such as COBOL, JCL, DB2 & CICS.
- Led the new banking application that is used to create new set of contracts in the banking system.
- Prepared use cases, designed and developed object models and class diagrams.
