Sr.Spark and Hadoop Developer Resume Jersy City, NJ - Hire IT People

SUMMARY:

Over 8 years of extensive IT experience in all phases of Software Development Life Cycle (SDLC), including 3+ years of strong experience working on Apache Hadoop ecosystem and Apache Spark .
Worked extensively with Hadoop Distributions like Cloudera, Hortonworks.
In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
Experience in ingesting data from FTP/SFTP servers using Flume.
Experience in developing Kafka Consumer API using Spark Scala applications.
Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
Experienced in analyzing the data using PIG Latin scripts.
Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
Experience in developing Hive UDF's and running hive scripts using different execution engines like Tez and Spark (Hive on Spark ).
Experience in designing tables and views for reporting using Impala.
Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it. Work Flows:
Rich experience in automating Sqoop and Hive queries using Oozie workflow.
Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.
Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
Experience with Azure Components like Azure SQl Database and Data Factory.
Experienced in working with different file formats - Avro, Parquet, RC and ORC.
Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.

WORK EXPERIENCE:

Sr.Spark and Hadoop Developer

Confidential, Jersy City, NJ

Responsibilities:

Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Developed Spark API to import data into HDFS from Teradata and created Hive tables.
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
Involved in performance tuning of Hive from design, storage and query perspectives.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Integrated Hive and Tableau Desktop reports and published to Tableau Server.
Developed shell scripts for running Hive scripts in Hive and Impala.
Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
Used Jira for bug tracking and BitBucket to check-in and checkout code changes.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Sr.Hadoop / Spark Developer

Confidential, San Francisco, CA

Responsibilities:

Involved in the Complete Software development life cycle (SDLC) to develop the application.
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce on EC2.
Worked with the Data Science team to gather requirements for various data mining projects.
Worked with different source data file formats like JSON, CSV, and TSV etc.
Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
Import and export data between the environments like MySQL, HDFS and deploying into productions.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
Involved in developing Impala scripts to do Adhoc queries.
Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
Involved in importing and exporting data from HBase using Spark .
Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Hortonworks, MapReduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.

Hadoop / Spark Developer

Confidential, Denver, CO

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
Experienced in managing and reviewing Hadoop log files.
Involved in loading data from UNIX file system to HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
Implemented Partitions, Buckets in Hive for optimization.
Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.

Environment: Apache Hadoop, Cloudera, Hive, Pig, Sqoop, Zookeeper, HBase, Java, Oozie, Oracle, Teradata, and UNIX Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

Created Mysql Database Backups and tested restore process on Test Environment
Implemented authentication and authorization service using Kerberos authentication protocol.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig and Sqoop.
Developed complex queries using HIVE and IMPALA.
Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
Implemented Map Reduce jobs in HIVE by querying the available data.
Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
Experience in scheduling the jobs through Oozie.
Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns
Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
Performance tuning using Partitioning, bucketing of IMPALA tables.
Exported the result set from HIVE to MySQL using Shell scripts.
Actively involved in code review and bug fixing for improving the performance
Successful in creating and implementing complex code changes .

Environment : Hadoop, Cloudera 5.8, Java, HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Kafka, Kerberos, Oozie, HBase, Talend, SQL, Spring, Linux, Eclipse, Windows 10/8.1/7.

Hadoop Developer

Confidential

Responsibilities:

Implemented CDH3 Hadoop cluster on CentOS. Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
Importing the data from the MySql and Oracle into the HDFS using Sqoop.
Importing the unstructured data into the HDFS using Flume.
Written Map Reduce java programs to analyze the log data for large-scale data sets.
Involved in creating Hive tables, loading and analyzing data using hive queries.
Involved in using HBase Java API on Java application.
Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
Responsible for managing data from multiple sources.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Java-based map-reduce.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop 1.0.0, MapReduce, Hive, HBase, Flume, Sqoop, Pig, Zookeeper, Java, ETL, SQL, CentOS.

We provide IT Staff Augmentation Services!

Sr.spark And Hadoop Developer Resume

Jersy City, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship