Hadoop Developer Resume
Richardson, TX
EXPERIENCE SUMMARY:
- Around 9 years of IT experience in Software Development, Having 4+ years of experience in Big DataHadoop and NoSQL technologies in various domains like Automobile, Finance, Insurance, Health care and telecom.
- 4 years of experience on Hadoop working environment includesMap Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Yarn, Cassandra, Kafka, Spark Core and Flume.
- Solid understanding of Hadoop Distributed File System.
- Good experience with MapReduce (MR), Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark streaming,Zookeeper for data extraction, processing, storage and analysis.
- In - depth understanding on how MapReduce works and Hadoop infrastructure.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
- Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data.
- Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Used Flume to channel data from different sources to HDFS.
- Job workflow scheduling and monitoring using tools like Oozie.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from Unix and NoSQL.
- Good experience in Cloudera, Hortonworks&ApacheHadoopdistributions.
- Worked with relational database systems (RDBMS) such as MySQL, MSSQL, Oracle and NoSQL database systems like HBase and Cassandra.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Good Knowledge on HadoopCluster architecture and monitoring the cluster.
- Used Shell Scripting to move log files into HDFS.
- Good understanding in processing of real-time data using Spark sql.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with different file formats like CSV, Text files, Sequence files, XML, JSON and Avro files.
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Good knowledge on python scripting, bash Scripting languages.
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
- Real streaming the data using Spark with Kafka and store the stream data to HDFS usingScala
- Involved in creating database objects like tables, views, procedures, triggers and functions using Confidential -SQL to provide definition, structure and to maintain data efficiently.
- Expert in Data Visualization development using Tableau to create complex and innovative dashboards.
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Reported the bugs by classifying them and have played major role in carrying out different types of tests viz. Smoke, Functional, Integration, System, Data Comparison and Regression testing.
- Experience in creating Master Test Plan, Test Cases, and Test Result Reports, Requirements Traceability Matrix and creating Status Reports and submitting to the Project management.
- Strong hands on experience in MVC frameworks and Spring MVC.
- Good in Designing and developing the Data Access Layer modules with the help of Hibernate Framework for the new functionalities.
- Extensively experience in working on IDEs like Eclipse, Net Beans and Edit Plus.
- Working knowledge of Agile and waterfall development models.
- Working experience in all SDLC Phases.
- Extensively used Java and J2EE technologies like Core Java, Java Beans, Servlet, JSP, spring, Hibernate, JDBC, JSON Object, and Design Patterns.
- Experienced in Application Development using Java, J2EE, JSP, Servlets, RDBMS, Tag Libraries, JDBC, Hibernate, XML and Linux shell scripting.
- Worked with different software version control, Jira, bug tracking and code review systems like CVS, Clear Case.
TECHNICAL SKILLS:
Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Storm and Avro
Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML, REST, SOAP, WSDL
Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases: MongoDB, Cassandra, HBase
Database: Oracle 11g/10g, DB2, MS: SQL Server, MySQL, Teradata.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP
Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2. Spring XD Spring Boot
Tools: Used: Eclipse, IntelliJ, GIT, Putty, Winscp
Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, RedHat
ETL Tools: Informatica, pentaho.
Testing: Hadoop Testing, Hive Testing, Quality Center (QC)
Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts.
PROFESSIONAL EXPERIENCE:
Confidential, Richardson, TX
Hadoop Developer
Responsibilities:
- Imported data from different relational data sources like RDBMS, Teradata to HDFS using Sqoop.
- Imported bulk data into HBase Using Map Reduce programs.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Used Rest ApI to Access HBase data to perform analytics.
- Developed Spark code using Scala andSpark-SQL/Streaming for faster testing and processing of data.
- Involved in converting Map Reduce programs into Sparktransformations usingSparkRDD's on Scala.
- Experienced with batch processing of data sources using ApacheSpark, Elastic search.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Worked with cloud services like Amazon web services (AWS)
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Phase I was developed to collect the data for real-time analytics and Kafka and Storm to Cassandra.
- Worked in writing the topologies and saving the records in Cassandra
- Push data as delimited files into HDFS using Talend Big data studio.
- Developed prototype for parsing high volume XML files using Hadoop and storing output in HDFS for Ab Initio
- Usage of different Talend Hadoop Component like Hive, Pig, Spark
- Deployment of a Hadoop cluster via Cloudera’s Enterprise Data Hub (EDH) on AWS.
- Import the data from different sources like HDFS/Hbase into Spark RDD developed a data pipeline using Kafka and Storm to store data into HDFS. Performed real time analysis on the incoming data.
- Committer on Spring Batch, Spring Hadoop and Spring XD.
- Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
- Experienced in managing and reviewing theHadooplog files.
- Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for input and output.
- Involve inData Asset Inventoryto gather, analyze, and document business requirements, functional requirements and data specifications forMember Retentionfrom sources SQL / Hadoop.
- Worked on solvingperformance and limit queriesto the workbooks that when it connects to live database by using a data extract option in Tableau.
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
Environment: CDH 5.3, Map Reduce, Hive0.14, Spark 1.4.1, Oozie, Sqoop, Pig0.11, Java, Rest API, Maven, MRUnit, Junit, Tableau, Cloudera,Python.
Confidential, San Francisco, CA
Hadoop Developer
Responsibilities:
- Involved in Automation of clickstream data collection and store into HDFS using Flume.
- Involved in creating Data Lake by extracting customer's data from various data sources into HDFS.
- Used Sqoop to load data from Oracle Database into HDFS.
- Developed MapReduce programs to cleanse the data in HDFS obtained from multiple data sources.
- Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
- Used Hive to analyze the data in HDFS to identify issues and behavioral patterns.
- Involved in production Hadoop cluster setup, administration, maintenance, monitoring and support.
- Real streaming the data using Spark with Kafka and store the stream data to HDFS usingScala.
- Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
- Worked with cloud services like Amazon web services (AWS)
- Logical implementation and interaction with HBase.
- Cluster coordination services through Zookeeper.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed MapReduce jobs to automate transfer of data from/to HBase.
- Created data queries and reports using Qlik view and Excel. Created Customs queries/reports designed for qualifying verification and information sharing.
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Used flume to collect the entire web log from the online ad-servers and push into HDFS.
- Implemented MapReduce job and execute the MapReduce job to process the log data from the ad servers.
- Extensively used Core Java, Servlets, JSP and XML
- Load and transform large sets of structured, semi structured and unstructured data.
- Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs to build and let other groups build dashboards.
- Responsible for building Scalable distributed data solutions using HortonWorks.
- Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
- Worked closely with architect and clients to define and prioritize use cases and develop APIs.
- Involve in monitoring job performance, capacity planning and workload using Cloudera Manager.
Environment: Hadoop, Pig 0.10, Sqoop, Oozie, MapReduce, HDFS, HBase. Hive 0.10, Core Java, Eclipse,Qlik view, Flume, Cloudera, Horton Works,Oracle 10g, UNIX Shell Scripting, Cassandra.
Confidential, Penfield, NY
Hadoop Developer
Responsibilities:
- Worked on Hadoop cluster (CDH 5) with 30 nodes.
- Worked with highly semi-structured and structured data of 90TB with replication factor 3.
- Extracted the data from Oracle, MySQL, and SQL server databases into HDFS using Sqoop.
- Extracted data from weblogs and social media using flume and loaded into HDFS.
- Created jobs in Sqoop with incremental load and populated Hive tables.
- Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala and Akka (Asynchronous programming Framework)
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Worked with cloud services like Amazon web services (AWS)
- Involved in Developing Assert Tracking project where we use to collect real-time vehicle location data using IBM streams from JMS queue and processed that data in Vehicle Tracking using ESRI - GIS Mapping Software, Scala and Akka Actor Model.
- Involved in developing web-services using REST, HBase Native API and BigSQL Client to query data from HBase.
- Experienced in Developing Hive queries in BigSQL Client for various use cases.
- Involved in developing few Shell Scripts and automated them using CRON job scheduler
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java
Confidential, Memphis,TN
Hadoop Developer
Responsibilities:
- Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Experienced with SOLR for indexing and search.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, Kafka, HBase, Cassandra, Cloudera Distribution, Oozie, Ambari, Ganglia, Yarn, Shell scripting
Confidential, Plano, TX
Java/J2EE/Hadoop Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements into technical specifications.
- Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, POJO’s and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Web services for the data maintenance and structures.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Responsible to manage data coming from different sources.
- Developed map reduce algorithms.
- Got good experience with NOSQL database.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Integrated Hadoop with Solr and implement search algorithms.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database.
- UsedHibernateORM framework withSpringframework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in creating templates and screens in HTML and JavaScript.
- Involved in integrating Web Services using SOAP.
Environment: Hive 0.7.1, Apache Solr - 3.x, HBase-0.90.x/0.20.x, JDK 1.5,, Struts 1.3, WebSphere 6.1, HTML, XML, JavaScript, JUnit 3.8,Oracle 10g, Amazon Web Services.
Confidential, McLean, VA
Java/J2EE Developer
Responsibilities:
- Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications
- Gathered and analyzed information for developing, supporting, and modifying existing web applications based on prioritized business needs
- Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA)
- Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages
- Played a key role in designing the presentation tier components by customizing the Spring framework components, which includes configuring web modules, request processors, error handling components, etc.
- Implemented the Web Services functionality in the application to allow external applications to access data
- Used Apache Axis as the Web Service framework for creating and deploying Web Service Clients using SOAP and WSDL
- Worked on Spring to develop different modules to assist the product in handling different requirements
- Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data
- Implemented Spring Beans using IOC and Transaction management features to handle the transactions and business logic
- Design and developed different PL/SQL blocks, Stored Procedures in DB2 database
- Involved in writing DAO layer using Hibernate to access the database
- Involved in deploying and testing the application using Websphere Application Server
- Developed and implemented several test cases using JUnit framework
- Involved in troubleshoot technical issues, conduct code reviews, and enforce best practices
Environment: Java SE 6, J2EE 6, JSP 2.1, Servlets 2.5, Java Script, IBM Websphere7, DB2, HTML, XML, Spring 3, Hibernate 3, JUnit, Windows 7, Eclipse 3.5
Confidential, Seattle, WA
Java/J2EE Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UIlayerlogics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- CSS and JavaScript were used to build rich internet pages.
- Agile Scrum Methodology been followed for the development process.
- Designed different design specifications for application development that includes front-end, back-end using design patterns.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Spring IOC being used to inject the parameter values for the Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Created connection through JDBC and used JDBC statements to call stored procedures.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.
Confidential
Application Developer
Responsibilities:
- Developed the application under JEE architecture, developed, Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
- Deployed & maintained the JSP, Servlets components on Web logic 8.0
- Developed Application Servers persistence layer using JDBC and SQL.
- Used JDBC to connect the web applications to Databases.
- Implemented Test First unit testing framework driven using Junit.
- Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
- Configured development environment using Web logic application server for developers integration testing.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS, log4j, Junit, Servlets, MVC