We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Houston, TX

PROFESSIONAL SUMMARY:

  • 7+ Years of professional experience in IT which includes 4 years of comprehensive experience in working with Apache Hadoop Eco - components, Spark streaming and Amazon Web services (AWS).
  • Worked on ingesting, reconciling, compacting, migrating and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie
  • Proficient in writing spark applications (Spark SQL, Machine learning and Graph database) using Scala, Java Python and R programming.
  • In depth working on Hadoop Architecture and various components such as HDFS, Yarn, Map Reduce, Pig, Hive, HBase, Zookeeper, Oozie and Flume
  • Hands on experience with AWS components like EC2, EMR, S3, and Cloud watch.
  • Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig, MapReduce.
  • Good working skill set of HDFS Designs, Daemons, HDFS High Availability (HA).
  • Experience in implementation of complete Big Data solutions, including data acquisition, storage, transformation, and analysis.
  • Worked on big data projects such as Streamline analytics and Data consolidation.
  • Worked on real time data integration and migration using Kafka, Spark and HBase.
  • Implemented MLib functions for training and building linear models using Spark streaming.
  • Experience in analyzing large-scale data to identify new analytics, insights, trends and relationships with a strong focus on data clustering.
  • Having Experience Semi-Structured Data Processing (Xml and Json) in Hive/Impala.
  • Experience in Writing HIVE UDFs, UDTF and UDAFs
  • Good working experience on Hadoop Distributions like Cloudera and Hortonworks.
  • Good working experience in creating event-processing data pipelines using flume, Kafka and Storm.
  • Expertise in data transformation & analysis using SPARK, PIG, HIVE
  • Compiled and configured ApacheTEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
  • Experience in importing and exporting Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
  • Experience in analyzing data using Cassandra QL, Hive QL and Pig Latin programs.
  • Supporting with the issues during testing and production.
  • Experience in implementing Custom Partitions and Combiners for effective data distributions.
  • Experience in writing simple to complex adhoc PIG Scripts using low CPU, low power and low memory UDFs.
  • Having Experience in writing simple to complex HIVE adhoc scripts.
  • Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
  • Good working experience in working with various compression techniques like Avro, Snappy, LZO
  • Good working experience in configuring simple to complex workflows using Oozie.
  • Good working experience of NoSQL databases like Hbase, Cassandra and MongoDB
  • Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD), and VM Ware.
  • Worked on different operating systems like UNIX/Linux, Windows XP, and Windows 2K
  • Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
  • Good working experience of Java, JDBC, Collections, JSP, JSON, REST, XML, SQL, UNIX and Eclipse.
  • Developed and maintained web applications running on Apache Web server.
  • Experience of working in Agile Software Development environment.
  • Implemented automatic workflows and job scheduling using Oozie, Zookeeper and Ambari.
  • Integrated Teradata, MongoDB, Cassandra, Salesforce (SFDC) with HDFS using HBase.

TECHNICAL SKILLS:

Hadoop/Big Data: Spark, Flume, Kafka, Hive, HBase, Pig, HDFS, Mapreduce, Python, Sqoop, Zookeeper, Oozie, Storm, Tez, Impala, Ambari

AWS Components: EC2, EMR, S3, RDS, CloudWatch

Languages/Technologies: Core Java, Scala, JDBC, Junit, C, C++, XML, SQL, Shell Script

Operating Systems: Linux, Windows, Centos, Ubuntu, RHEL

Databases: MySQL, Oracle 11g/10g/9i, MS-SQL Server, HBase, Cassandra, Mongodb

Web Technologies: DHTML, HTML, XHTML, XML, XPath XSD, CSS, JavaScript

Tools: Winscp, Wireshark, JIRA, IBM Tivoli

Scripting Languages: PHP

Others: HTML, XML

WORK EXPERIENCE:

Confidential, Houston, TX

Hadoop Developer

Responsibilities:

  • Worked on Cloudera distribution of Hadoop
  • The Data Interface is implemented to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
  • Experience with Talend and SQOOP to Import/Export data from RDBMs to HDFS.
  • The Oozie work flows is configured to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
  • Implemented Map Reduce programs to find out top failure locations of the ATM's using different tacking device.
  • The Cassandra CQL is used with Java API's to retrieve data from Cassandra tables
  • Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
  • Worked on optimizing Shuffle and Sort phase in Map Reduce Phase.
  • Experience in writing business logic using Hive UDF's to perform ad-hoc queries on structured data.
  • Experience with HIVE DDLs and Hive Query language (HQLs)
  • Worked on dash boards that internally use Hive queries to perform analytics on structured data, Avro and Json data.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Experienced in handling Avro and Json data in Hive using Hive SerDe's.
  • Good knowledge and understanding of REST architecture style and its application to well performing web sites for global usage.
  • Written Map/Reduce programs, Pig scripts to specify the conditions to separate the fraudulent claims.
  • Performance tuning and indexing strategies using mongo utilities like Mongostat and Mongotop.
  • Migrated Mongo database systems from No-SSL authentication to SSL authentication using certificates.
  • Migrated ETL operations into Hadoop system using Pig Latin scripts.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Experience with Hive queries for data analysis to meet the business requirements.
  • Worked on test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Experience in managing and reviewing Hadoop log files.
  • Managing and scheduling Jobs on a Hadoop cluster using Ganglia.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins.

Environment: Hadoop, Hive, Map Reduce, HDFS, Pig, Sqoop, Maven, Jenkins, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Linux

Confidential, Carrolton, TX

Hadoop Developer

Responsibilities:

  • Utilized DevOps principle components to ensure operational excellence before deploying in production.
  • Supporting with the issues during testing and production.
  • Developed Scripts and Batch Jobs to schedule various Hadoop Program using Oozie or Zookeeper.
  • Implemented Spark streaming on all kinds of data using most optimized and performance tuning techniques.
  • Worked with the Teradata analysis team to gather business requirements.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Optimized Hive queries using compact and bitmap indexing for quick look up inside tables.
  • Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
  • Used Java APIs such as machine library functions, graph algorithms for training and predicting the linear model in spark streaming.
  • Worked on Streamline analytics and data consolidation projects on the product Connect Home.
  • Operating the cluster on AWS by using EC2, EMR, S3 and CloudWatch.
  • Import structural data using Sqoop to load data from MySQL, Oracle to HDFSand vice versa on regular basis.
  • Import unstructured data using kafka to load data from Teradata, Cassandra, MongoDB and HBase and vice versa on regular basis.
  • Have implemented unit testing in Java for pig and hive applications.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
  • Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
  • Developed Pig Latin scripts to extract data from web server output file to load into HDFS.
  • Developed Pig UDFs to pre-process data for analysis.
  • Worked on Druid using ANSI and Presto for increasing query performance in core machines
  • Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
  • Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
  • Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
  • Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
  • Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions
  • Integrated and migrated Kafka, Spark and HBase for streamline analytics on the top of Amazon Web services (AWS) Platform on Data Ware House (DWH) Application
  • Worked on ingesting, reconciling, compacting, migrating and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie.
  • Prepared developer (unit) test cases and executed developer testing.
  • Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
  • Created 50 buckets for each Hive ORC table based on clustering by client Id for better performance (optimization) while updating the tables.
  • Transported data to HBase using Flume.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: CDH5, MapReduce, HDFS, Hive, SQOOP, Pig, Linux, XML, MySQL, MySQL Workbench, PL/SQL, SQL connector

Confidential, New York City, NY

Hadoop Developer

Responsibilities:

  • Involved in creating Hive tables and loading and analyzing data using Hive queries.
  • Developed simple/complex MapReduce jobs using Hive and Pig on Data Ware House (DWH) Application.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Developed multiple Spark jobs in PySpark for data cleaning and preprocessing.
  • Analyzed large data sets by running Hive queries and Pig scripts on the top of Amazon Web services
  • (AWS) Platform .
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
  • Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
  • Created 30 buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating the tables.
  • Worked with DB Manager tool, Druid using SQL execution tools like Hive and Presto.
  • Responsible for managing data from multiple data sources.
  • Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
  • Supporting with the issues during testing and production.
  • Developed Merge jobs in Python to extract and load data from MySQL database to HDFS
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
  • Developed Python scripts to monitor health of MongoDB databases and perform ad-hoc backups using Mongo dump and Mongo restore
  • Used Java APIs such as machine library functions, graph algorithms for training and predicting the linear model in spark streaming.
  • Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Implemented agile methodologies using Java APIs for taking maximum leverage on network calls.
  • The layer security is implemented by some user defined exceptions using Javacore libraries.
  • Extracted the data from other data sources into HDFS using Sqoop
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Expert in importing and exporting data into HDFS using Sqoop and Flume.
  • Experience in using Sqoop to migrate data back and forth from HDFS and MySQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.

Environment: Hadoop, HDFS, Spark, Pig, Hive, Sqoop, HBase, MySQL, Python, Spark, TEZ

Confidential, Wilmington, DE

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
  • Developed the Pig UDF'S to pre-process the data for analysis on Data Ware House (DWH) Application.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
  • Analyze the log files and process through Flume
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS
  • Performance gaining using Java APIs such as inbound and outbound of different files manually.
  • Implementing Predictive modeling by JUnit test case.
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
  • Developed Hive queries for the analysts.
  • Cluster co-ordination services through Zookeeper.
  • Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization

Environment: UNIX Scripting, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata and Eclipse

Confidential

Java Developer

Responsibilities:

  • End to End designing of Critical Core Java Components using Java Collections and Multithreading.
  • Development of multiple reports to business in quick turn-around time,
  • Created one of the best programs to notify the operational team on Downtime of one of 250 pharmacies on AP network in a few seconds.
  • Analysis of different database schemas Transaction and Data warehouse to build
  • Extensive reports to Business using SQL & Joins.
  • Created an interface using JSP, Servlet and MVC Struts architecture for pharmacy team to resolve stuck orders in different pharmacies.
  • Performance tuned the IMS report to memory leaks and best practices in java to boost the performance and reliability of the application.

Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX, J2EE (JSP, Servlets, Java Beans, JDBC, Multi-Threading), LINUX, (Shell & Perl Scripting), and SQL

Confidential

Associate Java Developer

Responsibilities:

  • Developed Servlets and Java Server Pages (JSP)
  • Enhancement of the System according to the customer requirements.
  • Created test cases scenarios for Functional Testing.
  • Used Java Script validation in JSP pages.
  • Writing Pseudo-code for Stored Procedures.
  • Developed PL SQL queries to generate reports based on client requirements.
  • Helped design the database tables for optimal storage of data.
  • Coded JDBC calls in the Servlets to access the Oracle database tables.
  • Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
  • Prepared final guideline document that would serve as a tutorial for the users of this application

Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX

We'd love your feedback!