Hadoop Developer Resume Houston, TX - Hire IT People

PROFESSIONAL SUMMARY:

7+ Years of professional experience in IT which includes 4 years of comprehensive experience in working with Apache Hadoop Eco - components, Spark streaming and Amazon Web services (AWS).
Worked on ingesting, reconciling, compacting, migrating and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie
Proficient in writing spark applications (Spark SQL, Machine learning and Graph database) using Scala, Java Python and R programming.
In depth working on Hadoop Architecture and various components such as HDFS, Yarn, Map Reduce, Pig, Hive, HBase, Zookeeper, Oozie and Flume
Hands on experience with AWS components like EC2, EMR, S3, and Cloud watch.
Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig, MapReduce.
Good working skill set of HDFS Designs, Daemons, HDFS High Availability (HA).
Experience in implementation of complete Big Data solutions, including data acquisition, storage, transformation, and analysis.
Worked on big data projects such as Streamline analytics and Data consolidation.
Worked on real time data integration and migration using Kafka, Spark and HBase.
Implemented MLib functions for training and building linear models using Spark streaming.
Experience in analyzing large-scale data to identify new analytics, insights, trends and relationships with a strong focus on data clustering.
Having Experience Semi-Structured Data Processing (Xml and Json) in Hive/Impala.
Experience in Writing HIVE UDFs, UDTF and UDAFs
Good working experience on Hadoop Distributions like Cloudera and Hortonworks.
Good working experience in creating event-processing data pipelines using flume, Kafka and Storm.
Expertise in data transformation & analysis using SPARK, PIG, HIVE
Compiled and configured ApacheTEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
Experience in importing and exporting Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
Experience in analyzing data using Cassandra QL, Hive QL and Pig Latin programs.
Supporting with the issues during testing and production.
Experience in implementing Custom Partitions and Combiners for effective data distributions.
Experience in writing simple to complex adhoc PIG Scripts using low CPU, low power and low memory UDFs.
Having Experience in writing simple to complex HIVE adhoc scripts.
Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
Good working experience in working with various compression techniques like Avro, Snappy, LZO
Good working experience in configuring simple to complex workflows using Oozie.
Good working experience of NoSQL databases like Hbase, Cassandra and MongoDB
Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD), and VM Ware.
Worked on different operating systems like UNIX/Linux, Windows XP, and Windows 2K
Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
Good working experience of Java, JDBC, Collections, JSP, JSON, REST, XML, SQL, UNIX and Eclipse.
Developed and maintained web applications running on Apache Web server.
Experience of working in Agile Software Development environment.
Implemented automatic workflows and job scheduling using Oozie, Zookeeper and Ambari.
Integrated Teradata, MongoDB, Cassandra, Salesforce (SFDC) with HDFS using HBase.

TECHNICAL SKILLS:

Hadoop/Big Data: Spark, Flume, Kafka, Hive, HBase, Pig, HDFS, Mapreduce, Python, Sqoop, Zookeeper, Oozie, Storm, Tez, Impala, Ambari

AWS Components: EC2, EMR, S3, RDS, CloudWatch

Languages/Technologies: Core Java, Scala, JDBC, Junit, C, C++, XML, SQL, Shell Script

Operating Systems: Linux, Windows, Centos, Ubuntu, RHEL

Databases: MySQL, Oracle 11g/10g/9i, MS-SQL Server, HBase, Cassandra, Mongodb

Web Technologies: DHTML, HTML, XHTML, XML, XPath XSD, CSS, JavaScript

Tools: Winscp, Wireshark, JIRA, IBM Tivoli

Scripting Languages: PHP

Others: HTML, XML

WORK EXPERIENCE:

Confidential, Houston, TX

Hadoop Developer

Responsibilities:

Worked on Cloudera distribution of Hadoop
The Data Interface is implemented to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
Experience with Talend and SQOOP to Import/Export data from RDBMs to HDFS.
The Oozie work flows is configured to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
Implemented Map Reduce programs to find out top failure locations of the ATM's using different tacking device.
The Cassandra CQL is used with Java API's to retrieve data from Cassandra tables
Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
Worked on optimizing Shuffle and Sort phase in Map Reduce Phase.
Experience in writing business logic using Hive UDF's to perform ad-hoc queries on structured data.
Experience with HIVE DDLs and Hive Query language (HQLs)
Worked on dash boards that internally use Hive queries to perform analytics on structured data, Avro and Json data.
Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily
Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
Experienced in handling Avro and Json data in Hive using Hive SerDe's.
Good knowledge and understanding of REST architecture style and its application to well performing web sites for global usage.
Written Map/Reduce programs, Pig scripts to specify the conditions to separate the fraudulent claims.
Performance tuning and indexing strategies using mongo utilities like Mongostat and Mongotop.
Migrated Mongo database systems from No-SSL authentication to SSL authentication using certificates.
Migrated ETL operations into Hadoop system using Pig Latin scripts.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Experience with Hive queries for data analysis to meet the business requirements.
Worked on test scripts to support test driven development and continuous integration.
Responsible to manage data coming from different sources.
Experience in managing and reviewing Hadoop log files.
Managing and scheduling Jobs on a Hadoop cluster using Ganglia.
Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
Experience in build scripts using Maven and do continuous integrations systems like Jenkins.

Environment: Hadoop, Hive, Map Reduce, HDFS, Pig, Sqoop, Maven, Jenkins, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Linux

Confidential, Carrolton, TX

Hadoop Developer

Responsibilities:

Utilized DevOps principle components to ensure operational excellence before deploying in production.
Supporting with the issues during testing and production.
Developed Scripts and Batch Jobs to schedule various Hadoop Program using Oozie or Zookeeper.
Implemented Spark streaming on all kinds of data using most optimized and performance tuning techniques.
Worked with the Teradata analysis team to gather business requirements.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Optimized Hive queries using compact and bitmap indexing for quick look up inside tables.
Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
Used Java APIs such as machine library functions, graph algorithms for training and predicting the linear model in spark streaming.
Worked on Streamline analytics and data consolidation projects on the product Connect Home.
Operating the cluster on AWS by using EC2, EMR, S3 and CloudWatch.
Import structural data using Sqoop to load data from MySQL, Oracle to HDFSand vice versa on regular basis.
Import unstructured data using kafka to load data from Teradata, Cassandra, MongoDB and HBase and vice versa on regular basis.
Have implemented unit testing in Java for pig and hive applications.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
Developed Pig Latin scripts to extract data from web server output file to load into HDFS.
Developed Pig UDFs to pre-process data for analysis.
Worked on Druid using ANSI and Presto for increasing query performance in core machines
Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions
Integrated and migrated Kafka, Spark and HBase for streamline analytics on the top of Amazon Web services (AWS) Platform on Data Ware House (DWH) Application
Worked on ingesting, reconciling, compacting, migrating and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie.
Prepared developer (unit) test cases and executed developer testing.
Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
Created 50 buckets for each Hive ORC table based on clustering by client Id for better performance (optimization) while updating the tables.
Transported data to HBase using Flume.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: CDH5, MapReduce, HDFS, Hive, SQOOP, Pig, Linux, XML, MySQL, MySQL Workbench, PL/SQL, SQL connector

Confidential, New York City, NY

Hadoop Developer

Responsibilities:

Involved in creating Hive tables and loading and analyzing data using Hive queries.
Developed simple/complex MapReduce jobs using Hive and Pig on Data Ware House (DWH) Application.
Loaded and transformed large sets of structured, semi structured and unstructured data.
Developed multiple Spark jobs in PySpark for data cleaning and preprocessing.
Analyzed large data sets by running Hive queries and Pig scripts on the top of Amazon Web services
(AWS) Platform .
Involved in running Hadoop jobs for processing millions of records of text data.
Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
Created 30 buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating the tables.
Worked with DB Manager tool, Druid using SQL execution tools like Hive and Presto.
Responsible for managing data from multiple data sources.
Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
Supporting with the issues during testing and production.
Developed Merge jobs in Python to extract and load data from MySQL database to HDFS
Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
Developed Python scripts to monitor health of MongoDB databases and perform ad-hoc backups using Mongo dump and Mongo restore
Used Java APIs such as machine library functions, graph algorithms for training and predicting the linear model in spark streaming.
Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
Implemented agile methodologies using Java APIs for taking maximum leverage on network calls.
The layer security is implemented by some user defined exceptions using Javacore libraries.
Extracted the data from other data sources into HDFS using Sqoop
Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Expert in importing and exporting data into HDFS using Sqoop and Flume.
Experience in using Sqoop to migrate data back and forth from HDFS and MySQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.

Environment: Hadoop, HDFS, Spark, Pig, Hive, Sqoop, HBase, MySQL, Python, Spark, TEZ

Confidential, Wilmington, DE

Hadoop Developer

Responsibilities:

Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
Developed the Pig UDF'S to pre-process the data for analysis on Data Ware House (DWH) Application.
Involved in loading data from LINUX file system to HDFS.
Importing and exporting data into HDFS and Hive using Sqoop and Flume.
Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
Analyze the log files and process through Flume
Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS
Performance gaining using Java APIs such as inbound and outbound of different files manually.
Implementing Predictive modeling by JUnit test case.
Exported analyzed data to HDFS using Sqoop for generating reports.
Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
Developed Hive queries for the analysts.
Cluster co-ordination services through Zookeeper.
Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization

Environment: UNIX Scripting, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata and Eclipse

Confidential

Java Developer

Responsibilities:

End to End designing of Critical Core Java Components using Java Collections and Multithreading.
Development of multiple reports to business in quick turn-around time,
Created one of the best programs to notify the operational team on Downtime of one of 250 pharmacies on AP network in a few seconds.
Analysis of different database schemas Transaction and Data warehouse to build
Extensive reports to Business using SQL & Joins.
Created an interface using JSP, Servlet and MVC Struts architecture for pharmacy team to resolve stuck orders in different pharmacies.
Performance tuned the IMS report to memory leaks and best practices in java to boost the performance and reliability of the application.

Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX, J2EE (JSP, Servlets, Java Beans, JDBC, Multi-Threading), LINUX, (Shell & Perl Scripting), and SQL

Confidential

Associate Java Developer

Responsibilities:

Developed Servlets and Java Server Pages (JSP)
Enhancement of the System according to the customer requirements.
Created test cases scenarios for Functional Testing.
Used Java Script validation in JSP pages.
Writing Pseudo-code for Stored Procedures.
Developed PL SQL queries to generate reports based on client requirements.
Helped design the database tables for optimal storage of data.
Coded JDBC calls in the Servlets to access the Oracle database tables.
Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
Prepared final guideline document that would serve as a tutorial for the users of this application

Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Houston, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship