Hadoop Developer Resume
Richardson, TX
SUMMARY
- 8+ years of IT experience in Software Development, Having 4+ years of experience in Big DataHadoop and NoSQL technologies in various domains like Automobile, Finance, Insurance, Health care and telecom.
- 4 years of experience on Hadoop working environment includesMap Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Yarn, Cassandra, Kafka, Spark and Flume.
- Solid understanding of Hadoop Distributed File System.
- Good experience with MapReduce (MR), Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark,Zookeeper for data extraction, processing, storage and analysis.
- In - depth understanding on how MapReduce works and Hadoop infrastructure.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
- Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data.
- Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Used Flume to channel data from different sources to HDFS.
- Job workflow scheduling and monitoring using tools like Oozie.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from Unix and NoSQL.
- Good experience in Cloudera, Hortonworks&ApacheHadoopdistributions.
- Worked with relational database systems (RDBMS) such as MySQL, MSSQL, Oracle and NoSQL database systems like HBase and Cassandra.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Good Knowledge on HadoopCluster architecture and monitoring the cluster.
- Used Shell Scripting to move log files into HDFS.
- Good understanding in processing of real-time data using Spark.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with different file formats like CSV, Text files, Sequence files, XML, JSON and Avro files.
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Good knowledge on python scripting, bash Scripting languages.
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
- Real streaming the data using Spark with Kafka and store the stream data to HDFS usingScala
- Involved in creating database objects like tables, views, procedures, triggers and functions using T-SQL to provide definition, structure and to maintain data efficiently.
- Expert in Data Visualization development using Tableau to create complex and innovative dashboards.
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Reported the bugs by classifying them and have played major role in carrying out different types of tests viz. Smoke, Functional, Integration, System, Data Comparison and Regression testing.
- Hands on experience with Microsoft Cortana Intelligence Suite, Microsoft R Server, Stream Analytics, Cognitive and Artificial Intelligence preferred
- Experience in creating Master Test Plan, Test Cases, and Test Result Reports, Requirements Traceability Matrix and creating Status Reports and submitting to the Project management.
- Strong hands on experience in MVC frameworks and Spring MVC.
- Good in Designing and developing the Data Access Layer modules with the help of Hibernate Framework for the new functionalities.
- Extensively experience in working on IDEs like Eclipse, Net Beans and Edit Plus.
- Working knowledge of Agile and waterfall development models.
- Working experience in all SDLC Phases.
- Extensively used Java and J2EE technologies like Core Java, Java Beans, Servlet, JSP, spring, Hibernate, JDBC, JSON Object, and Design Patterns.
- Experienced in Application Development using Java, J2EE, JSP, Servlets, RDBMS, Tag Libraries, JDBC, Hibernate, XML and Linux shell scripting.
- Worked with different software version control, Jira, bug tracking and code review systems like CVS, Clear Case.
TECHNICAL SKILLS
Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Storm and Avro
Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML, REST, SOAP, WSDL
Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases: MongoDB, Cassandra, HBase
Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP
Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.
Tools: Used: Eclipse, IntelliJ, GIT, Putty, Winscp
Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, RedHat
ETL Tools: Informatica, pentaho.
Testing: Hadoop Testing, Hive Testing, Quality Center (QC)
Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts.
PROFESSIONAL EXPERIENCE
Confidential, Richardson, TX
Hadoop Developer
Responsibilities:
- Involved in installation, configuration, supporting and managing Hadoop clusters, Hadoop cluster administration that includes commissioning & decommissioning of Data Node, capacity planning, slots configuration, performance tuning, cluster monitoring and troubleshooting.
- Worked onHadoopcluster scaling from 6 nodes in development environment to 10 nodes in pre-production stage and up to 32 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Participated in development and execution of system and disaster recovery processes.
- Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Supported tuple processing, writing data with Storm by provide Storm-Kafka connectors.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Created HBase tables to store various data formats of data coming from different sources.
- Administration, installing, upgrading and managing distributions of Hadoop (CDH3, CDH4, Cloudera manager), Hive, HBase.
- Real streaming the data using Spark with Kafka and store the stream data to HDFS usingScala.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera works.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Supporting Hadoop developers and assisting in optimization of Map Reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest required.
- Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Enabled Kerberos for authorization and authentication.
- Testing Hadoop components on sample datasets in local pseudo distribution mode.
- Implemented MR unit testing framework.
- Experience in configuring Java components using Spark.
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location inHadoopDistributed File System (HDFS).
- Enabled HA for Namenode, Resource Manager and Hive Metastore.
Environment: Hadoop 1x, HDFS, Map Reduce, Hive 0.10, Pig 0.11, Sqoop, HBase, Shell Scripting, Apache Solr, Java.
Confidential, San Francisco,CA
Hadoop Developer
Responsibilities:
- Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
- Developed a Wrapper Script around Teradata connector for Hadoop TCD to support option param’s.
- Used Sqoop extensively to ingest data from various source systems into HDFS.
- Hive was used to produce results quickly based on the report that was requested.
- Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
- Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
- Developed PIG UDFs for the needed functionality such as custom Pigsloader known as timestamp loader.
- Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
- Worked on different file formats like Text files, Sequence Files, Avro, Record columnar files (RC).
- Developed several shell scripts, which acts as wrapper to start theseHadoop jobs and set the configuration parameters.
- Kerberos security was implemented to safeguard the cluster.
- Worked on a stand-alone as well as a distributed Hadoop application.
- Tested the performance of the data sets on various NoSQL databases.
- Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
Environment: Hadoop, HDFS, Pig 0.10, Hive, MapReduce, Sqoop, Java Eclipse, SQL Server, Shell Scripting.
Confidential, Penfield, NY
Hadoop Developer
Responsibilities:
- Worked on Hadoop cluster (CDH 5) with 30 nodes.
- Worked with highly semi-structured and structured data of 90TB with replication factor 3.
- Extracted the data from Oracle, MySQL, and SQL server databases into HDFS using Sqoop.
- Extracted data from weblogs and social media using flume and loaded into HDFS.
- Created jobs in Sqoop with incremental load and populated Hive tables.
- Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala and Akka (Asynchronous programming Framework)
- Involved in Developing Assert Tracking project where we use to collect real-time vehicle location data using IBM streams from JMS queue and processed that data in Vehicle Tracking using ESRI - GIS Mapping Software, Scala and Akka Actor Model.
- Involved in developing web-services using REST, HBase Native API and BigSQL Client to query data from HBase.
- Experienced in Developing Hive queries in BigSQL Client for various use cases.
- Involved in developing few Shell Scripts and automated them using CRON job scheduler
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka,IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java
Confidential, Memphis,TN
Hadoop Developer
Responsibilities:
- Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Experienced with SOLR for indexing and search.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, Kafka, HBase, Cassandra, Cloudera Distribution, Oozie, Ambari, Ganglia, Yarn, Shell scripting
Confidential -Plano, TX
Java/J2EE/Hadoop Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements into technical specifications.
- Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, POJO’s and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Web services for the data maintenance and structures.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Responsible to manage data coming from different sources.
- Developed map reduce algorithms.
- Got good experience with NOSQL database.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Integrated Hadoop with Solr and implement search algorithms.
- Worked with cloud services like Amazon web services (AWS)
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database.
- UsedHibernateORM framework withSpringframework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in creating templates and screens in HTML and JavaScript.
- Involved in integrating Web Services using SOAP.
Environment: Hive 0.7.1,Apache Solr - 3.x, HBase-0.90.x/0.20.x,JDK 1.5,, Struts 1.3, WebSphere 6.1, HTML, XML, JavaScript, JUnit 3.8,Oracle 10g, Amazon Web Services.
Confidential - McLean, VA
Java/J2EE Developer
Responsibilities:
- Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications
- Gathered and analyzed information for developing, supporting, and modifying existing web applications based on prioritized business needs
- Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA)
- Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages
- Played a key role in designing the presentation tier components by customizing the Spring framework components, which includes configuring web modules, request processors, error handling components, etc.
- Implemented the Web Services functionality in the application to allow external applications to access data
- Used Apache Axis as the Web Service framework for creating and deploying Web Service Clients using SOAP and WSDL
- Worked on Spring to develop different modules to assist the product in handling different requirements
- Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data
- Implemented Spring Beans using IOC and Transaction management features to handle the transactions and business logic
- Design and developed different PL/SQL blocks, Stored Procedures in DB2 database
- Involved in writing DAO layer using Hibernate to access the database
- Involved in deploying and testing the application using WebsphereApplication Server
- Developed and implemented several test cases using JUnitframework
- Involved in troubleshoot technical issues, conduct code reviews, and enforce best practices
Environment: Java SE 6, J2EE 6, JSP 2.1, Servlets 2.5, Java Script, IBM Websphere7, DB2, HTML, XML, Spring 3, Hibernate 3,JUnit, Windows 7, Eclipse 3.5
Confidential - Seattle, WA
Java/J2EE Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UIlayerlogics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- CSS and JavaScript were used to build rich internet pages.
- Agile Scrum Methodology been followed for the development process.
- Designed different design specifications for application development that includes front-end, back-end using design patterns.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Spring IOC being used to inject the parameter values for the Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Created connection through JDBC and used JDBC statements to call stored procedures.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.
Confidential
Application Developer
Responsibilities:
- Developed the application under JEE architecture, developed, Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
- Deployed & maintained the JSP, Servlets components on Web logic 8.0
- Developed Application Servers persistence layer using JDBC andSQL.
- Used JDBC to connect the web applications to Databases.
- Implemented Test First unit testing framework driven using Junit.
- Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
- Configured development environment using Web logic application server for developers integration testing.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS, log4j, Junit, Servlets, MVC