Sr. Hadoop Developer Resume
Plantation, FL
SUMMARY
- A motivated and result driven professional with 8+ years of experience in Software development including 3+ years of heavy exposure to Big Data /Hadoop, Actively involved in analysis and implementation of various trending technologies in Big Data Eco Systems and NoSQL Technologies under different verticals like Finance, Health - care and Insurance.
- 5 Years of exposure to full development life cycle of Java/J2EE Application/Web development.
- Proficient in processing large sets of structured, semi-structured and unstructured data for data mining using optimized Map Reduce programs, PIG scripts and HIVE queries.
- Responsible for creating complex Map Reduce programs by customizing framework at various levels.
- Good knowledge in writing customized UDF’s, UDAF’s and UDTF’s to extend Hive and Pig Latin functionality.
- Experience in loading Log data and unstructured data from multiple sources to HDFS using Flume.
- Performed Real time event processing of data from multiple servers in the organization usingApache Storm by integrating withApacheKafka.
- Extensive experience inSparkStreaming and implementing Spark machine learning libraries in Scala.
- Expertise in NOSQL databases like HBase, Cassandra and Mongo DB.
- Experience with Cassandra in optimizing it for writes and pre-computing aggregations to perform various statistics.
- Involved in installing and maintaining 36 node MongoDB cluster with replication and sharding enabled.
- Involved in data modelling and designing indexing model for MongoDB.
- Performed data modeling to connect data stored in Cassandra Database to the data processing layers and wrote queries in CQL.
- Experience in using the Sqoop for importing and exporting data from HDFS, HBase and Hive to Relational Database Systems and vice versa.
- Extensive experience in Oozie for designing, monitoring and scheduling both time driven and data driven automated job workflows.
- Hands on experience with puppet for automating the Hadoop Installations, configuring and maintaining the clusters.
- Experience in using Cloudera Manager, Apache Ambari, Ganglia and Nagios for monitoring jobs running on cluster.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities of Apache Spark written in Scala.
- Experienced in Spark Streaming in order to ingest data from multiple data sources into HDFS.
- Experience with Flume and Apache Kafka to create data pipeline to ingest browsing data into HBase/HDFS for analysis.
- Working experience in creating, configuring and monitoring Hadoop clusters on EC2, VM, and Horton works Data Platform 2.1 & 2.2, CDH3, and CDH4 using Cloudera Manager.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Worked on all phases of data warehouse development lifecycle, ETL design and implementation, and support of new and existing applications.
- Extensive experience in designing and developing the enterprise applications using Java, J2EE Technologies, JavaScript, Struts, Hibernate, EJB and Spring Framework.
- Widely used different Web/Application servers like WebLogic, Web Sphere 6.x, JBoss and Tomcat Servers for deployment of builds, Server Configuration and performance tuning including troubleshooting and maintenance.
- Extensive experience with RDBMS integration with enterprise applications in writing SQL Queries, Stored Procedures, Functions and Triggers using Oracle 9i/10g/11g, IBM DB2 and MySql.
- Strong understanding of Agile (Scrum) and Waterfall SDLC methodologies.
- Experience in developing web-based User Interface using ExtJS, Javascript, jQuery, CSS, HTML, HTML5 and XHTML.
TECHNICAL SKILLS
Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Storm, Spark, Scala, Avro, Mrunit, Solr.
Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML, REST, SOAP, WSDL
Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases: MongoDB, Cassandra, HBase
Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP, WSDL
Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.
Tools: Used: Eclipse, IntelliJ, Putty, Winscp
Operating System: RedHat, Windows 7/8, server 2008/2003, Mac OS.
ETL Tools: Informatica, pentaho.
Testing: Hadoop Testing(MRunit, Mockito), Hive Testing, Quality Center (QC)
Application/Web Servers: IBM Websphere 5.1.2/5.0/4.0/3.5 , Weblogic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.
Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts.
Version control: SVN, CVS, GIT
PROFESSIONAL EXPERIENCE
Confidential, Plantation, FL
Sr. Hadoop Developer
Environment: CloudEra Hadoop, MapReduce, HDFS, Hive, Java (jdk1.7), Pig, Flume, HBase, Sqoop. Oozie. DB2, TeraData, Apache Spark Environment, Apache Kafka, Scala, Storm, Solr, REST, Jersey, Linux, XML.
Responsibilities:
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Involved in installation and configuration ofHDFS, Hadoop MapReduce and developed several Map Reduce operations in Java for data preprocessing.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Experience in using Spark Sql to implement Custom JOINS to create tables containing the records of items.
- Designed and implemented Spark-based large-scale parallel relation-learning system.
- Experienced in bulk loading of data in Hbase using MapReduce by directly creating H-files and loading them.
- Experienced in creating custom source and sink in flume to support client data API’s.
- Collected and aggregated huge amount of log data from multiple sources and integrated into HDFS usingFlume.
- Integrated the data taken from multiple databases like DB2 and Tera Data into Hadoop cluster and used Hive-HBase integration for analyzing the data.
- Involved in developing web-services using REST implemented in java using HBase Native API and Jersey to query data from HBase.
- Used HBase as a real time data storage and analytics platform and the reports generated from HBase are used as feedback for the production system.
- Experience in developing data pipeline usingKafkaand Storm to store data into HDFS.
- Experienced in collecting the real-time data from Kafka using Spark Streaming and perform transformations and aggregation on the fly to build the common learner data model and persists the data into HBase.
- Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
- Involved in installation and configuration of Hive and also written various Hive User Defined Functions for categorization.
- Used Hive as the core database for the data warehouse where it is used to track and analyze all the data usage across our network.
- Experienced with Solr for indexing and search operations and configuring Solr by modifying schema.xml file as per our requirements.
- Experienced in using Oozie to coordinate and automate the flow of jobs in the cluster accordingly.
- Experienced in managing and reviewingHadooplog files.
- Worked on different file formats like Text files, Sequence Files, Avro and Record columnar files (RC).
- Explored and used Hadoop ecosystem features and architectures.
Confidential, Austin, TX
Hadoop Developer
Environment: CloudEra Hadoop, MapReduce, HDFS, Hive, Java, Pig, MongoDB, Cassandra, JSON, XML. HBase, Sqoop. Oozie, Shell Scripts, Apache Crunch, Apache Spark Environment, Apache Storm, MRUnit, Mockito, Netcat, Http, Linux.
Responsibilities:
- Implemented performance optimizations using distributed cache, Partitioning, Bucketing and Map Side joins in HIVE.
- Experience in automating using UNIX shell scripts on the Hive data
- Implemented Hive for data mining, internal log analysis and ad hoc queries.
- Implemented Pig Latin scripts to describe structural and semantic conversions between data contexts.
- Experience in using Pig loader for parsing JSON and XML files and used Regex in Pig to extract useful information from Pig Relations.
- Experienced in using Apache Flume for log file aggregation and processing.
- Experienced in designing and configuration of Flumeagents to collect data from the network proxy servers and store to HDFS.
- Used Flume to extract files from Netcat and HTTP sources and place them in HDFS and process them.
- Experience in developing applications by using find keyword and aggregations in MongoDB.
- Experience in using MongoDB Map reduce connector in order to run MapReduce programs on the data residing in MongoDB for some user stories.
- Expertise in developing MapReduce programs implementing various data processing logics by customizing the framework at various levels.
- Experience configuring spouts and bolts in various Storm topologies and validating data in the bolts.
- Integrating bulk data into Cassandra file system using Map Reduce programs.
- Worked on connecting to a 5-node Cassandra cluster from java using DataStax Java Driver and developed a web application used for searching.
- Involved in configuring 36 node MongoDB cluster with data replication and hash based sharding.
- Expert in MRUnit and Mockito for implementing test class for MapReduce programs.
- Involved in Hive testing using custom written shell scripts.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode high availability, capacity planning, and slots configuration.
- Experienced in using Apache Crunch for data cleaning and processing.
Confidential, Leesburg, GA
Hadoop Developer
Environment: CloudEra Hadoop, MapReduce, HDFS, Hive, Java, Pig, Flume, HBase, Sqoop. Oozie. Shell Scripts, Cron, Linux, XML.
Responsibilities:
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest by utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume and Sqoop.
- Real time experience in designing and implementing Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion.
- Involved in developing various MapReduce programs in order to implement various transformations and filtrations according to various user stories.
- Experienced in developing applications to process, cleanse, and report on data utilizing various analytics platforms like Hadoop and various NO-SQL Databases.
- Experienced in processing server, application and user log files using Hive in combination with Pig.
- Experience in using Pig to sort and prep our data before it is handed off to our Java Map/Reduce jobs.
- Implemented Hive queries to pre-process and analyze streaming data by granting read only structure.
- Experience in using Oozie workflows to organize/arrange many Hive queries.
- Responsible for migrating ETL scripts into hadoop framework by using Hive, Pig and Map Reduce programs wherever necessary.
- Experience in automating migrated ETL applications using Oozie workflows and error handling using shell scripts.
- Experienced in collecting and aggregating large amounts of log data using Apache Flumeand using HDFS as staging layer for further analysis.
- Involved in developing Shell Scripts and automated those using CRON job scheduler.
- Involved in Commissioning and Decommissioning Hadoop nodes, monitoring and troubleshooting of cluster, manage and review data backups and Hadoop log files.
- Experience in developing scripts for SQOOP Ingestion and Hadoop Copy Merge.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
Confidential, Mayfield, OH
Sr. Java/J2EE Developer
Environment: J2EE, Spring framework, Spring MVC, Hibernate, JSP, Servlets, JDBC, AJAX, JQuery, JavaScript, Oracle 10g, IBM RAD, Tomcat 7, CVS, JUnit.
Responsibilities:
- Played key role in design and development of enterprise application using J2EE technologies and Spring framework using Service Oriented Architecture (SOA).
- Implemented Spring Beans using IOC and Transaction management features to handle the transactions and business logic.
- Participated in Production deployment and change management process.
- Worked in all the modules of the application which involved front-end presentation logic developed using Tiles, JSP, JSTL and java script, Business objects developed using POJOs and data access layer using Hibernate framework.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Used Apache Axis as the Web Service framework for creating and deploying Web Service Clients using SOAP and WSDL
- Developed various generic JavaScript functions used for validations.
- Used AJAX extensively to implement front end /user interface features in the application.
- Design and developed different PL/SQL blocks, Stored Procedures in DB2 database
- Focused on Test Driven Development; thereby creating detailed JUnit tests for every single piece of functionality before actually writing the functionality.
- Developed and implemented several test cases using JUnit framework
- Used Ant scripts to build and deploy the applications in Tomcat Server.
- Used Log4j utility to generate run-time logs.
- CVS was used for project management and version management.
- Involved in troubleshoot technical issues, conduct code reviews, and enforce best practices.
Confidential, Pittsburgh, PA
Sr. Java/J2EE Developer
Environment: J2EE, EJB, Struts framework, JSP, Servlets, REST, JDBC, AJAX, JQuery, JavaScript, PL/SQL, Oracle 10g, Web sphere, Ant, JUnit.
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Involved in developing UI pages using JSP, Java Script HTML/DHTML and Ajax.
- Developed Dispatch Actions, Action Forms and Custom tag libs in Struts framework.
- Loaded external data using RESTful web service and managing the XML data.
- Extensively applied various design patterns such as MVC-2, Front Controller, Factory, Singleton, Business Delegate, Session Façade, Service Locator, DAO etc. throughout the application for a clear and manageable distribution of roles.
- Developed different interfaces using EJB Session Beans (Stateless) and Message Driven Beans for both synchronous and asynchronous communication.
- Developed different Components and Adapters of the integration framework using Stateless Session EJB.
- Actively involved in configuration management tool CVS in managing the code.
- Involve in Initial designing and creating Use case diagrams, Sequence Diagrams and class diagrams using the Rational Rose tool.
- Set up Application server like Web sphere and used Ant tool to build the application and deploy the application in Web sphere.
- Wrote PL/SQL queries to access data from Oracle database.
Confidential
Java/J2EE Developer
Environment: J2EE, EJB, Struts framework, Hibernate, JSP, Servlets, REST, JDBC, AJAX, JQuery, JavaScript, XML, SAX, DOM, PL/SQL, Oracle 10g, WebLogic, Maven, JUnit.
Responsibilities:
- Applied MVC Design Pattern with JSP as view, Struts Action Servlets as controller and EJB session beans as model, deployed it on WebLogic server.
- Developed the business logic inJavaback-end using Struts Framework.
- Used Hibernate to fetch data from Oracle database.
- Used WSAD/Eclipse development environment for building EnterpriseJavaBeans.
- Worked in Linux environment to run batch jobs and used Maven to build the application.
- Used JavaScript for Client side validation.
- Parsed the data which is in XML format using SAX and DOM parsers.
- Created UML diagrams (use case, class, sequence, and collaboration) based on the business requirements
- Implement the back end business logic involved in registering new users and managing user related functionalities.
- Used CVS for version control.
- Used Log4j and JUnits to log and unit test the functionality.
