Hadoop Admin Resume
NC
SUMMARY
- Over 7+ years of professional work experience with Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Pig, Hive, HBase) and Java Development.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience with installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Extensive knowledge in programming and software development experience with skills in data analysis, design, development, testing and deployment of software systems from development stage to production stage with giving emphasis on Object oriented paradigm.
- Hands on experience in installing configuring and using Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Hive, Sqoop, Pig, Oozie, Zookeeper and Flume with CDH4&5 distributions and EC2 cloud computing with Amazon Web Services (AWS).
- Good Exposure on Apache HadoopMapReduce framework, PIG Scripting and HDFS.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Extending HIVE and PIG core functionality by using custom UDFs and have a very good working knowledge on Pentaho ETL, IBM Big Insight and Cassandra.
- Good understanding of Talend ETL.
- Have knowledge on Talend Open Studio.
- Experience with all flavors of Hadoop distributions, which includes Cloudera, Hortonworks and MapR.
- Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like HBase and Cassandra.
- Database Developer: Hands on experience with Database Design, SQL and PL/SQL.
- Strong development experience in Message Oriented and Service Oriented Technologies like WSDL/SOAP and SOA (Web Services) and RESTful API.
- Excellent working knowledge in Application servers such as WebSphere, WebLogic, JBoss, Apache Tomcat.
- Experience in exporting/importing the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa. Used Hadoop Streaming utility well to run MapReduce jobs.
- Experienced in Core Java, Servlets, JSP, Struts, Spring, Hibernate, JDBC and Web Service.
- Experience in Design and Development of java web services using XML, SOAP, WSDL, and UDDI based on SOA and has excellent understanding of xml technologies XML, XSD, XSL, SAX, DOM, JAXB 2.0.
- Have knowledge on developingHadoopstreaming MapReduce works usingPython.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies.
- Extensively worked in Agile, TDD and Scrum development methodology.
TECHNICAL SKILLS
BIG DATA: Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, NoSQL, Flume, Zookeeper, Oozie, Impala.
WEB TECHNOLOGIES: Java, J2EE, JSP, Servlets, Struts, Hibernate, Spring, Spring MVC, Spring DAO, Spring Security, RMI, JDBC, JMS, DHTML, XML, XSLT, Spring WS, Drools, JBoss Enterprise portal, JBoss seams, JSTL, EJB, Web Services, JSF, Rich Faces, Birt Report, Crystal Reports, HTML 5, CSS, Ajax, SOAP, JavaScript, Web Services.
LANGUAGES: Java, PL/SQL.
FRAMEWORKS: Hadoop, HDFS, Map Reduce, Pig, Hive.
DATABASES: SQL Server, MySQL, DB2, Oracle.
OPERATING SYSTEMS: Windows, Linux, Unix (Sun Solaris), Ubuntu.
VERSION CONTROL: Github, SVN.
DEVELOPMENT TOOLS: Eclipse, SOAP UI, HP QTP, File Aid, QMF, Spufi, Visual Source Safe, ENDEVOR, XPEDITOR, Test Director, Team Forge.
OTHER TOOLS: SQL Developer, Maven, ANT, Log4J, Junit.
DOMAIN KNOWLEDGE: Health Care, Retail, People Systems, Finance.
PROFESSIONAL EXPERIENCE
Confidential, Des Moines, IA
Sr. Hadoop Developer
Responsibilities:
- Worked on the proof-of-concept for Apache Hadoop framework initiation.
- Developed complex Map Reduce programs in Java for Data Analysis on different data formats.
- Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
- Developed Secondary sorting implementation to get sorted values at reduce side to improve map reduce performance.
- Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deploy the Talend ETL processes for Data ware house team using PIG, HIVE on Hortonworks Hadoop.
- Optimizing Talend jobs.
- Loading log data into HDFS using Flume, Kafka.
- Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computationsto handle custom business requirements.
- Experience with distributed systems, map reduce systems, data modeling and Big Data systems
- Responsible for performing extensive data validation using Hive.
- Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Daily Oozi jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Perform data analysis using Hive and Pig.
- Worked intuning Hive and Pig scriptsto improve performance.
- Experience with spark for streaming data analysis.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Implemented Hive Generic UDF's to implement business logic.
- Involved in creating Distribute cache, join and filtering.
- Configured build scripts for multi module projects with Maven.
Environment: HDFS, Map Reduce, Hive, Kafka, Sqoop, Spark, Flume, Zookeeper, Oozie, HFiles, HBase, Hadoop, CDH4, Java, Linux, Maven, Oracle 11g/10g, SVN, JDK 1.7, JSP, Agile, ETL, Crunch API, HTML, XML, JavaScript, Toad 9.6, UNIX Shell Scripting.
Confidential, Rochester, MN
S. Hadoop Developer
Responsibilities:
- Worked on the proof-of-concept for Apache Hadoop framework initiation.
- Experience in HDFS, MapReduce and Hadoop Framework.
- Configured the Hadoop Cluster in Local (Standalone), Pseudo-Distributed, Fully-Distributed Mode.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dump from the multiple sources and the output was written back to HDFS.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Development of Hadoop Map Reduce programs.
- Experience in writing MapReduce jobs in python for some complicated queries.
- Moving all the log information into HDFS.
- Retrieved data using HQL from Hive.
- Identifying insurances are referred to customers.
- Grouping the same insurances by analyzing the messages.
- Written Map Reduce code to convert semi Structured Data to Structured data.
- Developed a Framework that will create external and manageable tables in a batch processing based on the metadata files.
- Analyzed data using RStudio.
- Successfully designed and developed a solution for speeding up a SQL Job using Hadoop Map-Reduce framework. Processing time was reduced from hours to Minutes.
- Involved in migrating the data from development cluster to QA cluster and from there to production cluster.
- Created the developer Unit test plans and executed unit testing in the development cluster.
Environment: JDK 1.6, Struts 1.3, JSP, Agile, ETL, Crunch API, HTML, JavaScript, Hadoop distribution of, Cloudera, Shell, Linux, Pig, Hive HQL, MapReduce, HBase, Sqoop, Oozie, Ganglia and Flume.
Confidential, NC
Hadoop Admin
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts pull data from various databases into Hadoop
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential
Hadoop / Java developer
Responsibilities:
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading data and writing Hive queries which will run internally in map reduce way.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Cloudera, DataStax, Flat files, Oracle 11g/10g, PL/SQL, SQL PLUS, UNIX Shell Scripting, Autosys r11.0.
Confidential
Java developer
Responsibilities:
- Used JBoss EAP6.0.1 for deploy and configured the CG application.
- Used Confluence repository for saving Customer Gateway documents and files.
- Worked on Rest API and Customer Gateway application uses HTTP basic authentication for its entire set of APIs.
- Developed Restful Web service to expose recent activity of the user as service using Jersey framework.
- Moved current functionality that was connecting to the database using JPA to access data through REST calls.
- Worked on JDBC for create/close database connections.
- Worked on Rest client that uses HTTP client.
- Socket programming experience with python.
- Used application/xml for API supports both inbound and outbound response data bindings.
- Worked on SOAP1.2 web services for consume and produce external system uses cox communication.
- Worked on Collection framework (Map/List) to set and get the query Params (CG).
- Worked on ORACLE 10g for storing and retrieving the data from database.
- Worked on UNIX machines for Deploy/Configure the JBoss EAP server to build the CG application.
- Worked on Server Tuning for increase the heap size for UNIX machines.
- Handled offshore team for implement Customer gateway design/architecture.
- Worked on Pl/SQL for query and fetch the data from database.
- Implemented web layer using JSF and Ice faces.
- Implemented business layer using Spring MVC.
- Implemented Getting Reports based on start date using HQL.
- Implemented Session Management using Session Factory in Hibernate.
- Developed the DO’s and DAO’s using hibernate.
- Implement SOAP web service to validate zip code using Apache Axis.
- Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
- Wrote PL/SQL program to send EMAIL to a group from backend.
Environment: Core Java, Python, J2EE, Log4J, JUnit, JSF, Git, SOA, SQL, REST, JIRA, Apache Tomcat, JSP, JSTL, CSS, GWT, CVS, Servlets, Struts, DB2, PL/SQL, Oracle JDBC,MVC, HTML, DHTML,Javascript, AJAX, JQUERY, Web Services, Hibernate, JBoss EAP 6.0.1, Oracle10g, UNIX.