We provide IT Staff Augmentation Services!

Sr.spark And Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Jersy City, NJ

SUMMARY:

  • Over 8 years of extensive IT experience in all phases of Software Development Life Cycle (SDLC), including 3+ years of strong experience working on Apache Hadoop ecosystem and Apache Spark .
  • Worked extensively with Hadoop Distributions like Cloudera, Hortonworks.
  • In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
  • Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
  • Experience in ingesting data from FTP/SFTP servers using Flume.
  • Experience in developing Kafka Consumer API using Spark Scala applications.
  • Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
  • Experienced in analyzing the data using PIG Latin scripts.
  • Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
  • Experience in developing Hive UDF's and running hive scripts using different execution engines like Tez and Spark (Hive on Spark ).
  • Experience in designing tables and views for reporting using Impala.
  • Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
  • Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it. Work Flows:
  • Rich experience in automating Sqoop and Hive queries using Oozie workflow.
  • Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.
  • Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
  • Experience with Azure Components like Azure SQl Database and Data Factory.
  • Experienced in working with different file formats - Avro, Parquet, RC and ORC.
  • Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.

WORK EXPERIENCE:

Sr.Spark and Hadoop Developer

Confidential, Jersy City, NJ

Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
  • Used Jira for bug tracking and BitBucket to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Sr.Hadoop / Spark Developer

Confidential, San Francisco, CA

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce on EC2.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Worked with different source data file formats like JSON, CSV, and TSV etc.
  • Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
  • Import and export data between the environments like MySQL, HDFS and deploying into productions.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Involved in developing Impala scripts to do Adhoc queries.
  • Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
  • Involved in importing and exporting data from HBase using Spark .
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Hortonworks, MapReduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.

Hadoop / Spark Developer

Confidential, Denver, CO

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
  • Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in loading data from UNIX file system to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
  • Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
  • Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
  • Implemented Partitions, Buckets in Hive for optimization.
  • Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.

Environment: Apache Hadoop, Cloudera, Hive, Pig, Sqoop, Zookeeper, HBase, Java, Oozie, Oracle, Teradata, and UNIX Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

  • Created Mysql Database Backups and tested restore process on Test Environment
  • Implemented authentication and authorization service using Kerberos authentication protocol.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig and Sqoop.
  • Developed complex queries using HIVE and IMPALA.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
  • Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Experience in scheduling the jobs through Oozie.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Performance tuning using Partitioning, bucketing of IMPALA tables.
  • Exported the result set from HIVE to MySQL using Shell scripts.
  • Actively involved in code review and bug fixing for improving the performance
  • Successful in creating and implementing complex code changes .

Environment : Hadoop, Cloudera 5.8, Java, HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Kafka, Kerberos, Oozie, HBase, Talend, SQL, Spring, Linux, Eclipse, Windows 10/8.1/7.

Hadoop Developer

Confidential

Responsibilities:

  • Implemented CDH3 Hadoop cluster on CentOS. Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the data from the MySql and Oracle into the HDFS using Sqoop.
  • Importing the unstructured data into the HDFS using Flume.
  • Written Map Reduce java programs to analyze the log data for large-scale data sets.
  • Involved in creating Hive tables, loading and analyzing data using hive queries.
  • Involved in using HBase Java API on Java application.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
  • Responsible for managing data from multiple sources.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Java-based map-reduce.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop 1.0.0, MapReduce, Hive, HBase, Flume, Sqoop, Pig, Zookeeper, Java, ETL, SQL, CentOS.

We'd love your feedback!