Sr.spark And Hadoop Developer Resume
Jersy City, NJ
SUMMARY:
- Over 8 years of extensive IT experience in all phases of Software Development Life Cycle (SDLC), including 3+ years of strong experience working on Apache Hadoop ecosystem and Apache Spark .
- Worked extensively with Hadoop Distributions like Cloudera, Hortonworks.
- In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
- Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
- Experience in ingesting data from FTP/SFTP servers using Flume.
- Experience in developing Kafka Consumer API using Spark Scala applications.
- Developed MapReduce programs in Java for data cleansing, data filtering, and data aggregation.
- Experienced in analyzing the data using PIG Latin scripts.
- Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
- Experience in developing Hive UDF's and running hive scripts using different execution engines like Tez and Spark (Hive on Spark ).
- Experience in designing tables and views for reporting using Impala.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Experience in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it. Work Flows:
- Rich experience in automating Sqoop and Hive queries using Oozie workflow.
- Experience in scheduling the jobs using Oozie Coordinator, Bundler and Crontab.
- Experience with AWS components like Amazon Ec2 instances, S3 buckets and Cloud Formation templates and Boto library.
- Experience with Azure Components like Azure SQl Database and Data Factory.
- Experienced in working with different file formats - Avro, Parquet, RC and ORC.
- Experience in different compression techniques like Gzip, LZO, Snappy and Bzip2.
WORK EXPERIENCE:
Sr.Spark and Hadoop Developer
Confidential, Jersy City, NJ
Responsibilities:
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Developed Spark API to import data into HDFS from Teradata and created Hive tables.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
- Used Jira for bug tracking and BitBucket to check-in and checkout code changes.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.
Sr.Hadoop / Spark Developer
Confidential, San Francisco, CA
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce on EC2.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Worked with different source data file formats like JSON, CSV, and TSV etc.
- Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
- Involved in developing Impala scripts to do Adhoc queries.
- Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
- Involved in importing and exporting data from HBase using Spark .
- Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Apache Hadoop, AWS, EMR, EC2, S3, Hortonworks, MapReduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.
Hadoop / Spark Developer
Confidential, Denver, CO
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
- Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Implemented Partitions, Buckets in Hive for optimization.
- Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.
Environment: Apache Hadoop, Cloudera, Hive, Pig, Sqoop, Zookeeper, HBase, Java, Oozie, Oracle, Teradata, and UNIX Shell Scripting.
Hadoop Developer
Confidential
Responsibilities:
- Created Mysql Database Backups and tested restore process on Test Environment
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig and Sqoop.
- Developed complex queries using HIVE and IMPALA.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
- Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
- Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
- Experience in scheduling the jobs through Oozie.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Actively involved in code review and bug fixing for improving the performance
- Successful in creating and implementing complex code changes .
Environment : Hadoop, Cloudera 5.8, Java, HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Kafka, Kerberos, Oozie, HBase, Talend, SQL, Spring, Linux, Eclipse, Windows 10/8.1/7.
Hadoop Developer
Confidential
Responsibilities:
- Implemented CDH3 Hadoop cluster on CentOS. Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the data from the MySql and Oracle into the HDFS using Sqoop.
- Importing the unstructured data into the HDFS using Flume.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
- Responsible for managing data from multiple sources.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Java-based map-reduce.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop 1.0.0, MapReduce, Hive, HBase, Flume, Sqoop, Pig, Zookeeper, Java, ETL, SQL, CentOS.