Hadoop Developer Resume
Houston, TX
PROFESSIONAL SUMMARY:
- 7+ Years of professional experience in IT which includes 4 years of comprehensive experience in working with Apache Hadoop Eco - components, Spark streaming and Amazon Web services (AWS).
- Worked on ingesting, reconciling, compacting, migrating and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie
- Proficient in writing spark applications (Spark SQL, Machine learning and Graph database) using Scala, Java Python and R programming.
- In depth working on Hadoop Architecture and various components such as HDFS, Yarn, Map Reduce, Pig, Hive, HBase, Zookeeper, Oozie and Flume
- Hands on experience with AWS components like EC2, EMR, S3, and Cloud watch.
- Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig, MapReduce.
- Good working skill set of HDFS Designs, Daemons, HDFS High Availability (HA).
- Experience in implementation of complete Big Data solutions, including data acquisition, storage, transformation, and analysis.
- Worked on big data projects such as Streamline analytics and Data consolidation.
- Worked on real time data integration and migration using Kafka, Spark and HBase.
- Implemented MLib functions for training and building linear models using Spark streaming.
- Experience in analyzing large-scale data to identify new analytics, insights, trends and relationships with a strong focus on data clustering.
- Having Experience Semi-Structured Data Processing (Xml and Json) in Hive/Impala.
- Experience in Writing HIVE UDFs, UDTF and UDAFs
- Good working experience on Hadoop Distributions like Cloudera and Hortonworks.
- Good working experience in creating event-processing data pipelines using flume, Kafka and Storm.
- Expertise in data transformation & analysis using SPARK, PIG, HIVE
- Compiled and configured ApacheTEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
- Experience in importing and exporting Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
- Experience in analyzing data using Cassandra QL, Hive QL and Pig Latin programs.
- Supporting with the issues during testing and production.
- Experience in implementing Custom Partitions and Combiners for effective data distributions.
- Experience in writing simple to complex adhoc PIG Scripts using low CPU, low power and low memory UDFs.
- Having Experience in writing simple to complex HIVE adhoc scripts.
- Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
- Good working experience in working with various compression techniques like Avro, Snappy, LZO
- Good working experience in configuring simple to complex workflows using Oozie.
- Good working experience of NoSQL databases like Hbase, Cassandra and MongoDB
- Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD), and VM Ware.
- Worked on different operating systems like UNIX/Linux, Windows XP, and Windows 2K
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
- Good working experience of Java, JDBC, Collections, JSP, JSON, REST, XML, SQL, UNIX and Eclipse.
- Developed and maintained web applications running on Apache Web server.
- Experience of working in Agile Software Development environment.
- Implemented automatic workflows and job scheduling using Oozie, Zookeeper and Ambari.
- Integrated Teradata, MongoDB, Cassandra, Salesforce (SFDC) with HDFS using HBase.
TECHNICAL SKILLS:
Hadoop/Big Data: Spark, Flume, Kafka, Hive, HBase, Pig, HDFS, Mapreduce, Python, Sqoop, Zookeeper, Oozie, Storm, Tez, Impala, Ambari
AWS Components: EC2, EMR, S3, RDS, CloudWatch
Languages/Technologies: Core Java, Scala, JDBC, Junit, C, C++, XML, SQL, Shell Script
Operating Systems: Linux, Windows, Centos, Ubuntu, RHEL
Databases: MySQL, Oracle 11g/10g/9i, MS-SQL Server, HBase, Cassandra, Mongodb
Web Technologies: DHTML, HTML, XHTML, XML, XPath XSD, CSS, JavaScript
Tools: Winscp, Wireshark, JIRA, IBM Tivoli
Scripting Languages: PHP
Others: HTML, XML
WORK EXPERIENCE:
Confidential, Houston, TX
Hadoop Developer
Responsibilities:
- Worked on Cloudera distribution of Hadoop
- The Data Interface is implemented to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
- Experience with Talend and SQOOP to Import/Export data from RDBMs to HDFS.
- The Oozie work flows is configured to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
- Implemented Map Reduce programs to find out top failure locations of the ATM's using different tacking device.
- The Cassandra CQL is used with Java API's to retrieve data from Cassandra tables
- Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
- Worked on optimizing Shuffle and Sort phase in Map Reduce Phase.
- Experience in writing business logic using Hive UDF's to perform ad-hoc queries on structured data.
- Experience with HIVE DDLs and Hive Query language (HQLs)
- Worked on dash boards that internally use Hive queries to perform analytics on structured data, Avro and Json data.
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Experienced in handling Avro and Json data in Hive using Hive SerDe's.
- Good knowledge and understanding of REST architecture style and its application to well performing web sites for global usage.
- Written Map/Reduce programs, Pig scripts to specify the conditions to separate the fraudulent claims.
- Performance tuning and indexing strategies using mongo utilities like Mongostat and Mongotop.
- Migrated Mongo database systems from No-SSL authentication to SSL authentication using certificates.
- Migrated ETL operations into Hadoop system using Pig Latin scripts.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Experience with Hive queries for data analysis to meet the business requirements.
- Worked on test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experience in managing and reviewing Hadoop log files.
- Managing and scheduling Jobs on a Hadoop cluster using Ganglia.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
Environment: Hadoop, Hive, Map Reduce, HDFS, Pig, Sqoop, Maven, Jenkins, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Linux
Confidential, Carrolton, TX
Hadoop Developer
Responsibilities:
- Utilized DevOps principle components to ensure operational excellence before deploying in production.
- Supporting with the issues during testing and production.
- Developed Scripts and Batch Jobs to schedule various Hadoop Program using Oozie or Zookeeper.
- Implemented Spark streaming on all kinds of data using most optimized and performance tuning techniques.
- Worked with the Teradata analysis team to gather business requirements.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Optimized Hive queries using compact and bitmap indexing for quick look up inside tables.
- Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
- Used Java APIs such as machine library functions, graph algorithms for training and predicting the linear model in spark streaming.
- Worked on Streamline analytics and data consolidation projects on the product Connect Home.
- Operating the cluster on AWS by using EC2, EMR, S3 and CloudWatch.
- Import structural data using Sqoop to load data from MySQL, Oracle to HDFSand vice versa on regular basis.
- Import unstructured data using kafka to load data from Teradata, Cassandra, MongoDB and HBase and vice versa on regular basis.
- Have implemented unit testing in Java for pig and hive applications.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
- Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
- Developed Pig Latin scripts to extract data from web server output file to load into HDFS.
- Developed Pig UDFs to pre-process data for analysis.
- Worked on Druid using ANSI and Presto for increasing query performance in core machines
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
- Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
- Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions
- Integrated and migrated Kafka, Spark and HBase for streamline analytics on the top of Amazon Web services (AWS) Platform on Data Ware House (DWH) Application
- Worked on ingesting, reconciling, compacting, migrating and purging base table and incremental table data using Hive and HBase and job scheduling through Oozie.
- Prepared developer (unit) test cases and executed developer testing.
- Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
- Created 50 buckets for each Hive ORC table based on clustering by client Id for better performance (optimization) while updating the tables.
- Transported data to HBase using Flume.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH5, MapReduce, HDFS, Hive, SQOOP, Pig, Linux, XML, MySQL, MySQL Workbench, PL/SQL, SQL connector
Confidential, New York City, NY
Hadoop Developer
Responsibilities:
- Involved in creating Hive tables and loading and analyzing data using Hive queries.
- Developed simple/complex MapReduce jobs using Hive and Pig on Data Ware House (DWH) Application.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Developed multiple Spark jobs in PySpark for data cleaning and preprocessing.
- Analyzed large data sets by running Hive queries and Pig scripts on the top of Amazon Web services
- (AWS) Platform .
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
- Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
- Created 30 buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating the tables.
- Worked with DB Manager tool, Druid using SQL execution tools like Hive and Presto.
- Responsible for managing data from multiple data sources.
- Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Supporting with the issues during testing and production.
- Developed Merge jobs in Python to extract and load data from MySQL database to HDFS
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
- Developed Python scripts to monitor health of MongoDB databases and perform ad-hoc backups using Mongo dump and Mongo restore
- Used Java APIs such as machine library functions, graph algorithms for training and predicting the linear model in spark streaming.
- Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Implemented agile methodologies using Java APIs for taking maximum leverage on network calls.
- The layer security is implemented by some user defined exceptions using Javacore libraries.
- Extracted the data from other data sources into HDFS using Sqoop
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Expert in importing and exporting data into HDFS using Sqoop and Flume.
- Experience in using Sqoop to migrate data back and forth from HDFS and MySQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
Environment: Hadoop, HDFS, Spark, Pig, Hive, Sqoop, HBase, MySQL, Python, Spark, TEZ
Confidential, Wilmington, DE
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance
- Developed the Pig UDF'S to pre-process the data for analysis on Data Ware House (DWH) Application.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
- Analyze the log files and process through Flume
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS
- Performance gaining using Java APIs such as inbound and outbound of different files manually.
- Implementing Predictive modeling by JUnit test case.
- Exported analyzed data to HDFS using Sqoop for generating reports.
- Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
- Developed Hive queries for the analysts.
- Cluster co-ordination services through Zookeeper.
- Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization
Environment: UNIX Scripting, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Teradata and Eclipse
Confidential
Java Developer
Responsibilities:
- End to End designing of Critical Core Java Components using Java Collections and Multithreading.
- Development of multiple reports to business in quick turn-around time,
- Created one of the best programs to notify the operational team on Downtime of one of 250 pharmacies on AP network in a few seconds.
- Analysis of different database schemas Transaction and Data warehouse to build
- Extensive reports to Business using SQL & Joins.
- Created an interface using JSP, Servlet and MVC Struts architecture for pharmacy team to resolve stuck orders in different pharmacies.
- Performance tuned the IMS report to memory leaks and best practices in java to boost the performance and reliability of the application.
Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX, J2EE (JSP, Servlets, Java Beans, JDBC, Multi-Threading), LINUX, (Shell & Perl Scripting), and SQL
Confidential
Associate Java Developer
Responsibilities:
- Developed Servlets and Java Server Pages (JSP)
- Enhancement of the System according to the customer requirements.
- Created test cases scenarios for Functional Testing.
- Used Java Script validation in JSP pages.
- Writing Pseudo-code for Stored Procedures.
- Developed PL SQL queries to generate reports based on client requirements.
- Helped design the database tables for optimal storage of data.
- Coded JDBC calls in the Servlets to access the Oracle database tables.
- Responsible for Integration, unit testing, system testing and stress testing for all the phases of project.
- Prepared final guideline document that would serve as a tutorial for the users of this application
Environment: Java 1.5, Servlets, J2EE 1.4, JDBC, Oracle, 10g, PL SQL, HTML, JSP, Eclipse, UNIX