Hadoop Developer Resume
New York, NY
SUMMARY
- A Qualified IT Professional with 8+ years of experience including around 4 years of experience as a Hadoop Consultant and 4 years of experience as Java Developer
- Excellent understanding / knowledge of Hadoop architecture and various Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Oozie, Pig, Cloudera, Zookeeper, Flume and Cassandra.
- Experience in installation, configuration, supporting and managing - CloudEra's Hadoop platformalong with CDH3&4 clusters.
- Familiar and good knowledge with Apache Spark ecosystem, Spark Streaming using.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database like HBase and knowledge on Cassandra and MongoDb.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- In depth knowledge of Job Tracker, Task Tracker, NameNode, Data Nodes and MapReduce concepts.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good experience in implementing and setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
- Experience in Object Oriented language like Java and Core Java.
- Experience in creating web-based applications using JSP and Servlets.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
TECHNICAL SKILLS:
Hadoop Technologies: HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Flume, Oozie, Zookeeper, Ambari, Hue, Spark, Strom, Talend, Ganglia
Operating System: Windows, Unix, Linux
Languages: Java, J2EE, SQL, PL/SQL, Shell Script
Project Management / Tools: MS Project, MS Office, TFS, HP Quality Center Tool
Front - End: HTML, JSTL, DHTML, JavaScript, CSS, XML, XSL, XSLT
Databases: MySQL, Oracle 11g/10g/9i, SQL Server
NoSQL Databases: HBase, Cassandra
File System: HDFS
Reporting Tools: Jasper Reports, Tableau
IDE Tools: Eclipse, NetBeans
Application Server: Apache Tomcat, Web Logic
PROFESSIONAL EXPERIENCE
Confidential, New York, NY
HADOOP DEVELOPER
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Wrote Map Reduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Passionate about working on the most cutting-edge Big Data technologies.
- Developed Pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on Impala.
- Involved in migrating HiveQL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Responsible for performing extensive data validation using Hive
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Involved in loading data from Teradata database into HDFS using Sqoop queries.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining
- Implemented business logic by writing PigUDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: ApacheHadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, Tableau.
Confidential, Glendale, AZ
Hadoop Consultant
Responsibilities:
- Installed and configured Cloudera Hadoop on a 100 node cluster.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning and processing.
- Developed data pipeline using Sqoop, Hive, Pig and Java MapReduce to ingest claim and policy histories into HDFS for analysis.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Applied MapReduce frameworkjobs in java for data processing by installing and configuring Hadoop, HDFS.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Responsible for architecting Hadoop clusters with CDH3.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on NoSQL databases including HBase and ElasticSearch.
- Performed cluster co-ordination through Zookeeper.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Installed and configured Hive and also written Hive UDFs.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Developed shell script to pull the data from third party system’s into Hadoop file system.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
Environment: Hadoop, MapReduce, HDFS, Flume, Cassandra, Sqoop, Pig, HBase, Hive, ZooKeeper, Cloudera, Oozie, ElasticSearch, Sqoop, NoSQL, UNIX/LINUX.
Confidential, Houston, TX
Hadoop Consultant
Responsibilities:
- Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate work place project. Interacted with the Business users to build the sample report layouts.
- Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
- Installed and Configured Cloudera Hadoop CDH4via Cloudera Manager in a pseudo distributed mode and cluster mode as a proof of concept.
- Created Map Reduce Jobs using Hive/Pig Queries.
- Extensively used Pig for data cleansing.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in configuring Sqoop to map SQL types to appropriate Java classes.
- Load and transform large sets of structured, semi structured and unstructured data.
- Cluster co-ordination services through ZooKeeper.
- Past 5 years TPSS data was collected from Teradata and pushed into HDFS using Sqoop.
- Involved in Unit testing, System Integration testing and UAT post development.
Environment: Hadoop, Oracle, Cloudera Hadoop CDH4, HiveQL, PigLatin, MapReduce, HDFS, HBase, ZooKeeper, Oozie, Oracle, PL/SQL, SQL*PLUS, Windows, UNIX, Shell Scripting.
Confidential, Carmel, IN
Java Developer
Responsibilities:
- System Requirements gathering for the project.
- Preparation of the Detailed Design document for the project by developing business process flows, requirements definition, use cases, and object model
- Designed and implemented a GUI framework for Swing. Developers using the framework define actions, popup menus in XML, the framework builds the graphical components.
- Designed the class diagrams and sequence diagrams.
- Developed the presentation layer and GUI framework that are written using JSP and client-side validations were done using JavaScript.
- Use MVC architecture.
- Creation Test plan. Development and coding of Test classes and Test Cases.
- Execution of Test cases in Jbuilder.
- Defect fixing. Client communication & Query resolution
- Used IBM Clear Case as version control and workspace management.
- Testing of the product: Unit Testing, Regression Testing, and Integration Testing.
- Used Eclipse as the IDE and Struts Framework for developing the application.
- Developed the JSPs for the application.
- Created Struts-config file and resource bundles for Distribution module using Struts Framework.
- Implemented Action Form classes, Action classes for the entire Reports module using Struts framework.
- Worked on core java for multithreading, arrays and GUI (AWT).
- Used Oracle 8i as the database and wrote SQL.
- Deployed the application on to Tomcat server.
Environment: Java, J2SE, Struts, Servlets, JSP, Tomcat, Eclipse, Oracle 8i, XML, HTML/DHTML, Jbuilder, Clear Case.
Confidential, Pittsburgh, PA
J2EE Developer
Responsibilities:
- Worked on designing and developing the Web Application User Interface and implemented its related functionality in JAVA/J2EE for the product.
- Designed and developed applications using JSP, Servlets and HTML.
- Used Hibernate ORM module as an Object Relational mapping tool for back end operations.
- Provided Hibernate configuration file and mapping files and also involved in the integration of Struts with Hibernate libraries.
- Extensively used Java Multi-Threading concept for downloading files from a URL.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Developed Web Service client interface for invoking the methods using SOAP.
- Created navigation component that reads the next page details from an XML config file.
- Developed applications with HTML, JSP and Tag libraries.
- Developed required stored procedures and database functions using PL/SQL.
- Developed, Tested and debugged various components in WebLogic Application Server.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Implemented Logging framework using Log4J.
- Involved in code review and documentation review of technical artifacts.
Environment: Java, Servlets, JSP, Hibernate, XML, Tomcat, WebLogic, Rational Rose, Eclipse, XML, XSL, Log4J and Windows XP.