We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Pottsville, PA

SUMMARY:

  • Highly Confident and Skilled Professional with having 8+ years of professional experience in IT industry, with hands - on expertise in Big Data processing using Hadoop, Hadoop Ecosystem implementation, maintenance, ETL and Big Data analysis operations.
  • Experience in installing, configuring and maintaining the Hadoop Cluster
  • Knowledge of administrative tasks such as installing Hadoop (on Ubuntu) and its ecosystem components such as Hive, Pig, Sqoop.
  • Have good expertise knowledge on Elastic Search, Spark Streaming.
  • Good knowledge about YARN configuration.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom Map Reduce programs in Java.
  • Experience in using NIFI processor groups, processors and concepts on process flow management.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive QL queries.
  • Experience in using ZooKeeper distributed coordination service for High-Availability.
  • Experience in migrating Data from RDMS to HDFS and Hive using Sqoop and converting SQL to HQL (Hive Query Language), UDF, scheduling Oozie jobs.
  • Strong in Teradata SQL, ANSI SQL coding skills with extensive work experience in Teradata DW.
  • Worked as SME for Java, Big Data, and, Splunk technologies.
  • Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
  • Involved in the Ingestion of data from various Databases like TERADATA (Sales Data Warehouse), Oracle, DB2, SQL-Server using Sqoop
  • Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts
  • Extensive experience with SQL, PL/SQL and database concepts
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (HBase).
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP
  • Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.

TECHNICAL SKILLS:

Hadoop Technologies: Apache Hadoop, Cloud era Hadoop Distribution (HDFS and Map Reduce)

Hadoop Ecosystem: Hive, Pig, Sqoop, Flume, Zookeeper, Oozie

NOSQL Databases: Hbase

Programming Languages: Java, C, C++, Linux shell scripting

Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML

Databases: MySQL, SQL, Oracle, SQL Server

Software Engineering: UML, Object Oriented Methodologies, Scrum, Agile methodologies

Operating System: Linux, Windows 7, Windows 8, XP

IDE Tools: Eclipse, Rational rose

PROFESSIONAL EXPERIENCE:

Confidential, Pottsville, PA

Senior Hadoop Developer

RESPONSIBILITIES:

  • Responsible for design & development of Spark SQL Scripts based on Functional Specifications.
  • Implemented Spark RDD Transformations and Actions.
  • Developed DF's, Case Classes for the required input data and performed the data transformations using Spark- Core.
  • Used Hive Queries in Spark-SQL for analysis and processing the data.
  • Used Apache NIFI for ingestion of data from the IBM MQ's (Messages Queue).
  • Implemented NIFI flow topologies to perform cleansing operations before moving data into HDFS.
  • Used Scala programming to perform transformations and applying business logic.
  • Implemented Partitioning, Dynamic Partition, Indexing and buckets in Hive.
  • Loaded the dataset into Hive for ETL Operation.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data. Streamed data from data source using Kafka.
  • Developed custom processors in java using maven to add the functionality in Apache NIFI for some additional tasks.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
  • Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Worked with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Streaming.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Worked on Azure Blobs for storing document, media file, and cloud objects.
  • Utilized in-memory processing capability of Apache Spark to process data using Spark SQL, Spark Streaming using PySpark scripts.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
  • Worked on Apache Nifi to Uncompressed and move Json files from local to HDFS.
  • Importing and exporting data into HDFS and HIVE using Kafka.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Developed data pipeline using Sqoop to ingest customer behavioral data into HDFS for analysis.

ENVIRONMENT: Hadoop, Map Reduce, Yarn, Hive, NIFI, Pig, Hbase, Oozie, Sqoop, Azure, Flume, Talend, ETL, Oracle 11g, Core Java, Cloud era HDFS, Eclipse.

Confidential - Cox automotive Atlanta, GA

Senior Hadoop Developer

RESPONSIBILITIES:

  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in writing Map Reduce jobs.
  • Involved in SQOOP, HDFS Put or Copy from Local to ingest data.
  • Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
  • Used Apache NIFI to copy the data from local file system to HDP.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Experienced in migrating Scala to minimize query response time.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Exported the result set from Hive to MySQL using Shell scripts.
  • Configured Hive using shared meta-store in MySQL and used Sqoop to migrate data into External Hive Tables from different RDBMS sources (Oracle, Teradata and DB2) for Data warehousing.
  • Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark by applying the transformations.
  • Used MongoDB to store Bigdata and applied aggregation Match, Sort and Group operation in MongoDB.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Designing & creating ETL jobs through Talend to load huge volumes of data into MongoDB, Hadoop Ecosystem and relational databases.
  • Implemented Frameworks using Java and Python to automate the ingestion flow.
  • Worked on reading multiple data formats on Hdfs using Pyspark.
  • Created Databases on Azure SQL Server.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in developing Hive DDLs to create, alter and drop Hive TABLES.
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
  • Worked with NIFI for managing the flow of data from source to HDFS.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Involved in developing NIFI Custom Processors to ingest data into Teradata.
  • Involved in processing ingested raw data using Map Reduce, Apache Pig and Hive.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.

ENVIRONMENT: Hadoop, Map Reduce, Yarn, Hive, Pig, Hbase, Azure, Oozie, Sqoop, Flume, Talend, ETL, Oracle 11g, Core Java, Cloud era HDFS, Eclipse.

Confidential, FL

Hadoop Developer

RESPONSIBILITIES:

  • Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
  • Responsible for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
  • Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
  • Used Sqoop tool to extract data from a relational database into Hadoop.
  • Involved in performance enhancements of the code and optimization by writing custom comparators and combiner logic.
  • Involved in developing Stored Procedures for fetching data from GreenPlum and created workflow using Apache Nifi.
  • Building the Spark Application and deploying on cluster.
  • Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
  • Developed the presentation layer using JSP, HTML, CSS and client side validations using JavaScript.
  • Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
  • Developed a web crawler code to obtain the raw data of product review and performed data cleansing in Python.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, PySpark and Scala.
  • Eclipse for application development in Java J2EE, JBOSS as the application server, and Node JS for standalone UI testing, Oracle as the backend, GIT as the version control and ANT for build script.
  • Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
  • Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run Map Reduce jobs in the backend.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Pair RDD'S, and YARN.
  • Involved in identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit.
  • Prepare daily and weekly project status report and share it with the client.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

ENVIRONMENT: Apache Hadoop, Java (JDK 1.7), Oracle, NIFI, My SQL, Hive, Pig, Sqoop, Linux, Cent OS, Junit, MR Unit, Cloud era

Confidential, Denver, CO

Java Developer / Hadoop Developer

RESPONSIBILITIES:

  • Experience in administration, installing, upgrading and managing CDH3, Pig, Hive & Hbase
  • Architecture and implementation of the Product Platform as well as all data transfer, storage and Processing from Data Center and to Hadoop File Systems
  • Experienced in defining job flows.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Implemented CDH3 Hadoop cluster on CentOS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Wrote Custom Map Reduce Scripts for Data Processing in Java
  • Importing and exporting data into HDFS and Hive using Sqoop and also used flume from to extract from multiple resources.
  • Used Apache NiFi to copy the data from local file system to HDFS.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map reduce way.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Flume to Channel data from different sources to HDFS
  • Created HBase tables to store variable data formats of PII data coming from different portfolios
  • Implemented best income logic using Pig scripts. Wrote custom Pig UDF to analyze data
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

ENVIRONMENT:: Hadoop, Map Reduce, Hive, Hbase, Flume, Pig, Zookeeper, Java, ETL, SQL, Centos, Eclipse.

Confidential, Rochester, MN

Java Developer

RESPONSIBILITIES:

  • Involved in Analysis, design and coding on J2EE Environment.
  • Implemented MVC architecture using Struts, JSP, and EJB's.
  • Used Core Java concepts in application such as multithreaded programming, synchronization of threads used thread wait, notify, join methods etc.
  • Presentation layer design and programming on HTML, XML, XSL, JSP, JSTL and Ajax.
  • Creating cross-browser compatible and standards-compliant CSS-based page layouts.
  • Worked on Hibernate object/relational mapping according to database schema.
  • Designed, developed and implemented the business logic required for Security presentation controller.
  • Used JSP, Servlet coding under J2EE Environment.
  • Designed XML files to implement most of the wiring need for Hibernate annotations and Struts configurations.
  • Responsible for developing the forms, which contains the details of the employees, and generating the reports and bills.
  • Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
  • Involved in designing of class and dataflow diagrams using UML Rational Rose.
  • Created and modified Stored Procedures, Functions, Triggers and Complex SQL Commands using PL/SQL.
  • Involved in the Design of ERD (Entity Relationship Diagrams) for Relational database.
  • Developed Shell scripts in UNIX and procedures using SQL and PL/SQL to process the data from the input file and load into the database.
  • Used CVS for maintaining the Source Code Designed, developed and deployed on WebLogic Server.
  • Performed Unit Testing on the applications that are developed.

ENVIRONMENT:: Java (JDK 1.6), J2EE, JSP, Servlet, Hibernate, JavaScript, JDBC, Oracle 10g, UML, Rational Rose, SOAP, Web Logic Server, JUnit, PL/SQL, CSS, HTML, XML, Eclipse

We'd love your feedback!