Hadoop Developer Resume Durham, NC - Hire IT People

SUMMARY

9+ years of experience with skills in analysis, design, development, debugging and deploying various software applications including 5+ years of experience in Hadoop Eco system and Big - Data Analytics.
IBM Certification In Big Data & Hadoop Development.
Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Hands on experience in installing, configuring and using ecosystem components like Hadoop, Map Reduce, HDFS, HBase, Oozie, Sqoop, Flume, Pig & Hive.
Experience in analyzing data using Pig Latin, HQL, HBase and custom MapReduce programs in Java.
Extending Hive and Pig core functionality by writing custom UDFs.
Hands on experience on configuring aHadoopcluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance and S3 configurations.
Skilled in developing Mapreduce programs using Hadoop Java API and also using Hives and pig to perform data analysis, data cleansing and data transformation.
Load log data into HDFS using Flume, Kafka and performing ETL integrations.
Experience in importing and exporting data using Sqoop from HDFS to Relational DB systems.
Configuring Kerberos and integrating with Directory services.
Experience in working with different data sources like Flat files, XML files, log files and Databases.
Excellent understanding of Object Oriented design methodology and Core Java Concepts such as multi-threading, exception handling, generics, annotations, collections and I/O.
Good understanding of NoSQL Data bases and hands on work experience in writing application on No SQL databases.
Experience in writing Shell scripting.
Worked with different Python libraries like Pandas, Numpy, boto3, web etc.
Experience in collecting business requirements, writing functional requirements and test cases and creating technical design documents with UML - Use Cases, Class, and Sequence and Collaboration diagrams.
Excellent communication skills, interpersonal skills, problem solving skills, and a very good team player along with can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, HBase, Avro.Spark

Data Bases: Oracle 7/8/8i/9i/10g/11g/12c, MySQL 5.6.16, 5.6.20 SQL Server 2000/2005/2008/2012/2014 , PostgreSQL

NOSQL Databases: HBase, MongoDB, Cassandra

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Programming Languages: Python, Java, SQL, PL/SQL, Scala, Unix shell scripts

Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX

Tools: Used: Eclipse, Putty, Superputty, MS Office, Crystal Reports, Microsoft Visio

PROFESSIONAL EXPERIENCE

Confidential, Durham, NC

Hadoop Developer

Responsibilities:

Working on cloudera hadoop clusters.
Set up keytab for scheduled applications Kerberos authentication on hadoop clusters.
Working on POC to migrate existing applications on cloudera hadoop cluster to EMR.
Extracted the data from SQL Developer into HDFS using SQOOP and scheduled an incremental load to HDFS.
Designed and developed data pipeline to ingest data from different sources to HDFS.
Working extensively on impala for preparing input data for application and ingesting application output into impala for analytics. worked on create/drop temp impala table on fly to load data into parquet table
Used data scripts in python and created csv files to append tables in impala.
Developed RDD's using Python and coded Python applications for business requirements.
Worked on submitting spark jobs using spark-submit command.
Worked on creating Spark RDD's and Data Frames applying operations like Transformation and Actions and converting RDD's to Data Frames.
Imported data from AWS S3 into spark RDD, Performed transformations and actions on RDD's.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
Developed Hive queries to process the data and generate the data cubes for visualizing.
Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Auto-scaling groups.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Developed and enhanced python code generate genotyping report using different data inputs.
Migrating data from traditional database to hadoop cluster using custom python script.
Developed Unix Shell scripts and python scripts for scheduling and automating the job flow.
Providing support for the production applications.
Participated in all the stages of software development lifecycle including design, development, implementation, and testing.

Environment: CDH5.9.1, Hadoop 2.6.0, YARN, HDFS, Spark 1.6.0, Sqoop 1.99.5, Hive 1.1.1, Pig 0.14.0, Hbase 1.1.2, Zookeeper 3.4.6, Oracle 11g/10g, Java 1.8, Python 2.7.5, PyCharm, Putty

Confidential, Wilmington, Delaware

Hadoop/Spark consultant

Responsibilities:

Worked on different size of clusters on Cloudera and Hotonworks distribution.
Created Hive and Tez views and providing permission to user and AD groups on Ambari server.
Created Hive internal/external tables with proper static and dynamic partitions and working on them using HQL.
Written Hive queries for data analysis to meet the business requirement.
Performance tuning using Partitioning, bucketing of HIVE tables.
Worked on creating UDF for Hive and Impala.
Worked on creating the RDD's, Data Frame's for the required input data and performed the data transformations usingSparkScala.
Import the data from different sources like HDFS/Hbase intoSparkRDD.
Developed spark scripts by using Scala shell as per requirements.
Used spark cluster to manipulate RDDS (Resilient Distributed Datasets) and also used concepts of RDD partitions.
Loading data into spark RDD and do in memory data Computation to generate the Output response.
Assisted in upgrading, configuration and maintenance of various Hadoop Ecosystem components like Pig, Hive, and Hbase.
Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
Involved in HDFS maintenance and loading the data using Sqoop and responsible to manage data coming from different sources.
Experience in working onSparkSQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS
Experience managing and reviewing the Hadoop log files.
Used PIG Latin scripts by defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.
Monitored workload, job performance and capacity planning using Cloudera Manager
Flume is used moving large amounts of log data from many different sources to a centralized data store.
Developed the sqoop scripts in order to make the interaction between HDFS and MySQL Database.
Providing Hue access as per user request.
Responsible for maintain voltage security at cluster level and voltage server level by understanding encryption and decryption process.
Worked on creating backup script for voltage severs using bash scripting and set up cron tab jobs.
Providing L2 level support for different Application. Managing and reviewing Log files for troubleshooting purpose, meeting the SLA's on time
Sending and Receiving Handovers to and from Offshore by following Global Delivery Model.

Environment: CDH5.5, Hadoop 2.7.0, YARN, HDFS, Spark 1.4.1, Sqoop 1.99.5, Hive 1.1.1, Flume 1.6.0, Oozie 4.1.0 HDP 2.0, Pig 0.14.0, Kafka 0.9.0,Hbase 1.1.2, Zookeeper 3.4.6, Jenkins 1.6, Oracle 11g/10g, MySQL 5.6.2, Java 1.6, Superputty, Scala IDE, voltage 6.3

Confidential, Atlanta, GA

Hadoop/Spark consultant

Responsibilities:

Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
Worked on the SparkSQL for analyzing the data
Used Scala to write code for all Spark use cases.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD'S, YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’S and Scala.
Worked in SparkSQL on different data formats like JSON and Parquet.
DevelopedSparkscripts by using Scala shell commands as per the requirement.
UsedSparkAPI over ClouderaHadoopYARN to perform analytics on data in HDFS.
Load the data intoSparkRDD and performed in-memory data computation to generate the output response.
Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.

Environment: CDH5.5, Hadoop 2.6.0, YARN, HDFS, Spark 1.4.1, Sqoop 1.99.5, Hive 1.1.1, Flume 1.6.0, Oozie 4.1.0, Pig 0.14.0, Kafka 0.9.0,Hbase 1.1.2, Zookeeper 3.4.6, Oracle 11g/10g, MySQL 5.6.2, Java 1.6

Confidential, Atlanta, GA

Java/Hadoop Developer

Responsibilities:

Experienced in loading and transforming large sets of structured, semi-structured and unstructured data.
Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
Worked on importing data from HDFS to MySQL database and vice-versa using SQOOP.
Extensive experience in writing HDFS & Pig Latin commands.
Develop UDF's to provide custom hive and pig capabilities and apply business logic on that data.
Created Hive internal/external tables with proper static and dynamic partitions.
Using Hive analyzed unified historic data in HDFS to identify issues & behavioral patterns
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Experience in NoSQL database such as Hbase.
Monitored workload, job performance and capacity planning using Cloudera Manager
Installed Oozie workflow engine to run multiple MapReduce, Hive, Zookeeper and Pig jobs which run independently with time and data availability.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Performed File system management and monitoring on Hadoop log files.
Implemented partitioning, dynamic partitions and buckets in HIVE.
Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
Created an e-mail notification service upon completion of job or the particular team which requested for the data

Environment: Hadoop 2.3.0, HDFS, Map Reduce, CDH5, HIVE 0.12.0, PIG 0.12.0, Hbase 0.98.1, Sqoop 1.4.3, Flume 1.4.0, Oozie 4.1.0, Zookeeper 3.4.5, MySQL 5.6.16, Java 1.6

Confidential, Norcross, GA

Software Engineer

Responsibilities:

Actively involved in all the phases of Software Development Life Cycle (SDLC) of the application Requirement gathering, Design Analysis and Code development
Extensively used core java concepts such as OOP and exceptional handling.
Developed Records using Java, HTML, CSS, JSP and Servlets and MySQL.
Worked on Eclipse IDE to write the code and integrate the application
Developed Information System J2EE and MySQL.
Responsible for developing the Struts-configuration file, Action classes for handling the Http requests from the front-end components, OOAD concepts applied.
Used Hibernate as the object relational mapping tool for persisting java objects.
Developed the front-end for faculty home pages using Dreamweaver.
Worked on documenting the project and analyzing the requirements of the project.
Tested the application for various inputs.
Wrote client side scripts in JavaScript for User signup, Administrator logon and for updating the profiles of users.
Involved in code reviews and cross checking whether coding standards are being followed.

Environment: Java/J2ee, JSP, Struts, Hibernate, Servlets, MySQL, SQL/PL SQL, Macromedia Dreamweaver, Apache Tomcat, JavaScript, HTML, Maven

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Durham, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship