We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • 8+ years of professional experience in IT, including 3 + years of work experience in Big Data, Hadoop Development and Ecosystem Analytics in Banking, Food & Beverage, Healthcare, and Insurance.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Sqoop, Oozie, Storm, Kafka, Spark, Cassandra, Flume and Avro.
  • Good knowledge on building Apache Spark applications using Scala.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra.
  • Experienced the deployment of Hadoop Cluster using Puppet tool.
  • Experience in managing and reviewing Hadoop Log files.
  • Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
  • Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Data frames API to load structured and semi structured data into Spark Clusters
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Have experience in Apache Spark, Spark Streaming, and Spark SQL.
  • Used Data frame API in Apache spark to load CSV, JSON and Parquet files.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
  • Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Work experience with cloud infrastructure like Amazon Web Services (AWS).
  • Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Experience in Web Services using XML, HTML and SOAP.
  • Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies that include EJB, JSP, Servlets, Struts II, JMS, JDBC, JAX-WS, JPA HTML, XML, XSL, XSLT, Java Script, spring and Hibernate.
  • Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
  • Experience in Web application development using Java, Servlets, JSP, JSTL, Java Beans, EJB, JNDI, JDBC, Struts, HTML, DHTML, CSS, PHP, XML, XSL/XSLT and AJAX.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, Map Reduce Hive, Pig, Pentaho, HBase, Zookeeper, Sqoop, Oozie, Cassandra, Flume, spark, storm, Kafka, Avro.

Programming Languages: Python, Java, J2EE, R, SQL, PL/SQL, Scala

Web Technologies: Core Java, J2EE, Servlets, JSP, JDBC, XML, AJAX, SOAP, WSDL

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2

Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access

Web Services: Web Logic, Web Sphere, Apache Tomcat

Monitoring & Reporting tools: Ganglia, Nagios, Talend, Custom Shell scripts

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Hadoop Developer

Responsibilities:

  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
  • Involved in capacity planning, configuration of Cassandra Cluster on DATASTAX.
  • Involved in data modeling the tables on Cassandra.
  • Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
  • Configured different topologies for Storm cluster and deployed them on regular basis.
  • Consumed the data from Kafka queue using Storm.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the Storm cluster.
  • Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
  • Implemented different machine learning techniques in Scala using Spark machine learning library.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Generated SSTables from csv file with the help of SSTableSimpleUnsortedWriter class in Java.
  • Importing and exporting data into HDFS from database and vice versa using Sqoop
  • Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as map-reduce, Hive, Pig.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Involved in creating Hive tables, loading with data and writing hive queries
  • Responsible in exporting analyzed data to relational databases using Sqoop.
  • Implemented Daily Oozie coordination jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Responsible for tuning Hive and Pig scripts to improve performance.
  • Implemented unit tests with MR Unit and PIG Unit.
  • Documented the technical details Hadoop cluster management and daily batch pipeline, which includes several jobs of MapReduce, Pig, Hive, Sqoop, Oozie and other scripts.

Environment: DATASTAX Cassandra, AWS, Spark, Spark, Spark SQL, Storm, Kafka, Map-Reduce, Hive, Pig, Oozie and Sqoop

Confidential - Sunnyvale, CA

Hadoop Developer

Responsibilities:

  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Monitored multiple Hadoop clusters environments using Ganglia.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Performed Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Responsible to manage data coming from different sources.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.

Environment: Hadoop 1x, HDFS, Map Reduce, Hive 10.0, Pig, Sqoop, Ganglia, HBase, Shell Scripting, Ubuntu 13.04

Confidential - Decatur, IL

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Wrote MapReduce job using Pig Latin.
  • Have solid understanding of REST architecture style and its application to well performing web sites for global usage.
  • Involved in ETL, Data Integration and Migration. Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL. Importing and exporting data into HDFS from Oracle Database and vice versa using Sqoop.
  • Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Experience in managing and reviewing Hadoop log files.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
  • Written Hive queries for data analysis to meet the business requirements.
  • Involved in writing Hive scripts to extract, transform and load the data into Database.
  • Used JIRA for bug tracking.
  • Used CVS for version control.

Environment: Hadoop, Hive, Linux, MapReduce, HDFS, Hive, Pig, Sqoop, Shell Scripting, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Linux, JIRA 5.1, CVS, JIRA 5.2.

Confidential - Stamford, CT

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Experienced in defining and coordination of job flows.
  • Gained experience in reviewing and managing Hadoop log files.
  • Extracted files from NoSQL database like CouchDB, HBase through Sqoop and placed in HDFS for processing.
  • Involved in Writing Data Refinement Pig Scripts and Hive Queries
  • Good knowledge in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Coordinated cluster services using Zookeeper.
  • Designed applications using Struts with Tiles and Validator, implementing MVC design pattern and writing Custom Tag Libraries, JSP, Java Beans, Struts Controller, Action and Action Form classes using Struts tag libraries.
  • Used XML Technologies like DOM for transferring data.
  • Object relational mapping and Persistence mechanism is executed using Hibernate ORM.
  • Developed custom validator in Struts and implemented server side validations using annotations.
  • Created struts-config.xml file for the Action Servlet to extract the data from specified Action form so as to send it to specified instance of action class.
  • Used Oracle for the database and WebLogic as the application server.
  • Involved in coding for DAO Objects using JDBC (using DAO pattern).
  • Used Flume to transport logs to HDFS
  • Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables.
  • Organize documents in more useable clusters using Mahout.
  • Configured connection between HDFS and Tableau using Impala for Tableau developer team.
  • Responsible to manage data coming from different sources.
  • Got good experience with various NoSQL databases.
  • Experienced with handling administration activations using Cloudera manager.
  • Supported MapReduce programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.

Environment: Apache Hadoop, Java, JDK1.6, J2EE, JDBC, Servlets, JSP, Struts 2.0, Spring 2.0, Hibernate 3.0, Linux, XML, WebLogic, SOAP, WSDL, HBase, Hive, Pig, Sqoop, Zookeeper, NoSQL, HBase, R, MAHOUT Map-Reduce, Cloudera, HDFS, Flume, Impala, Tableau, MySQL.

Confidential - Columbus, OH

Java Developer

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
  • Development of front-end i.e. JSPs and server-side java components i.e. container managed entity beans, stateless session beans, Action classes, writing unit test cases and unit testing.
  • Used agile methodology and participated in Scrum meetings.
  • Involved in developing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Developed Web services for sending and getting data from different applications using SOAP1.1messages, such as Loan Applications, to transfer data from Branch Server to Head Office Server, then used SAX and DOM XML1.1 parsers for data retrieval.
  • Integrated with Web Methods via web services.
  • Used Oracle 10g as the backend database using UNIX OS. Involved in design of the database schema and development of Stored Procedures.
  • Consumed web services from different applications within the network
  • Developed Custom Tags to simplify the JSP2.0 code. Designed UI screens using JSP 2.0, CSS, XML1.1 and HTML. Used JavaScript for client side validation.
  • Used GWT to send AJAX requests to the server and updating data in the UI dynamically.
  • Developed Hibernate 3.0 in Data Access Layer to access and update information in the database.
  • Used Spring 2.5 Framework for Dependency injection and integrated with Hibernate and Struts frameworks.
  • Configured Hibernate's second level cache using EHCache to reduce the number of hits to the configuration table data
  • Used Spring Web flow to manage complex page flows.
  • Used MULE ESB frame work for exchange of important information such as loan status report.
  • Designed and developed Utility Class that consumed the messages from the Java message Queue and generated emails to be sent to the customers. Used Java Mail API for sending emails.
  • Coded Maven build scripts to build and deploy the application on WebSphere
  • Used JUnit framework for unit testing of application and Log4j 1.2 to capture the log that includes runtime exceptions.
  • Used CVS for version control and used IBM RAD 6.0 as the IDE for implementing the application.
  • Supported Testing Teams and involved in defect meetings.

Environment: WebLogic Portal server 10.2, JSR168 Portlet, Polaris Intellect J2ee framework, Java/J2EE, Spring, EJB 2.1, Struts 1.2, JMS, Windows XP, Unix, Oracle 10i, JQuery1.7.1, Ext-JS 3.1, BIRT Chart Library 3.0, WebLogic Workspace studio 10.2 and Eclipse 3.3, Axis Webservices 1.4, Hibernate 3.3.2

Confidential 

Associate Developer

Responsibilities:

  • Developed the application under JEE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
  • Deployed & maintained the JSP, Servlets components on Web logic 8.0
  • Developed Application Servers persistence layer using, JDBC, SQL, Hibernate.
  • Used JDBC to connect the web applications to Data Bases.
  • Implemented Test First unit testing framework driven using Junit.
  • Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
  • Configured development environment using Web logic application server for developer’s integration testing.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS, log4j, Junit, Servlets, MVC, My Eclipse

We'd love your feedback!