We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Seattle, WA

SUMMARY

  • 7+ years of professional experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Big data Hadoop technologies
  • Over 4 Years of experience in Big Data Hadoop Ecosystems with ingestion, storage, querying, processing and analysis of big data
  • Excellent understanding of Hadoop architecture and its components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and MapReduce programming paradigm
  • Hands on experience in installing, configuring, monitoring and integration of Hadoop ecosystem components likeMapReduce, HDFS, HBase, Pig, Hive, Oozie, Sqoop, Flume, Spark and ZooKeeper
  • Experience in Data Load Management, importing and exporting data from HDFS to Relational Database Systems using Sqoop and Flume
  • Exported the analyzed data to various Databases like Teradata (Sales Data Warehouse), SQL - Server using Sqoop
  • Expertise in creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL and also experience in data transformation & file processing using Pig Latin Scripts
  • Expertise in writing custom UDFs and UDAFs in Pig & Hive Core Functionality.
  • Developed, deployed and supported several Map Reduce applications in Java to handle different types of data.
  • Extensively used Apache Kafka to load the log data from multiple sources directly into HDFS.
  • Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data
  • Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
  • Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions like Cloudera and Hortonworks
  • Experienced in working with Apache Ambari
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
  • Cluster co-ordination services through ZooKeeper
  • Experience in scheduling and monitoring jobs using Oozie and Zookeeper
  • Extensively used Informatica Power Center in end-to-end of Data warehousing ETL routines, which includes writing custom scripts, data mining and data quality process.
  • Maintenance/ Implementation of Commercial Software. Extensive work experience with Java/J2EE technologies such as Servlets, JSP, EJB, JDBC, JSF, Struts, spring, SOA, AJAX, XML/XSL, Web Services (REST, SOAP), UML, Design Patterns and XML Schemas
  • Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
  • Experience in JAVA, J2EE, WEB SERVICES, SOAP, HTML and XML related technologies.
  • Strong analytical and problem solving skills and ability to follow through with projects from inception to completion.
  • Ability to work effectively in cross-functional team environments, excellent communication and interpersonal skills.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambari, Storm, Spark and Kafka

No SQL Database: HBase, Cassandra

Monitoring and Reporting: Tableau, Custom Shell Scripts

Hadoop Distribution: Horton Works, Cloudera, MapR

Build Tools: Maven, SQL Developer

Programming and Scripting: Java, C++, JavaScript, Shell Scripting, Python

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/REST services

Databases: Oracle, MY SQL, MS SQL server, Teradata

ETL Tools: Talend

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript

IDE Dev. Tools: Eclipse 3.5, Net Beans, My Eclipse, Oracle, JDeveloper 10.1.3, SOAP UI, Ant, Maven, RAD

Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003

PROFESSIONAL EXPERIENCE

Confidential, Seattle, WA

Hadoop Developer

Responsibilities:

  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop and Spark
  • Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
  • Worked with NoSQL databases like HBase to create tables and store the data
  • Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
  • Developed custom aggregate functions usingSparkSQL and performed interactive querying.
  • Wrote Pig scripts to store the data into HBase
  • Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL
  • Stored the data in tabular formats using Hive tables and Hive SerDe.
  • Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
  • Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
  • Involved in creating calculated fields and dashboards in Tableau for visualization of the analyzed data.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using Snappy
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processed the data with Pig
  • Computed indexed views for data exploration using Apache Solr
  • Monitored and managed the Hadoop cluster using Apache Ambari
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and reviewdata backups, manage and reviewHadooplog files.

Environment: Hadoop, HDFS, Pig, Sqoop, Data Lake, Spark, R, MapReduce, Ambari, Hortonworks, Tableau, Snappy, Zookeeper, HBase, NoSQL, Shell Scripting, Ubuntu, Teradata, Solr

Confidential, Portland, OR

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data pipelines using Hadoop.
  • Used Apache Kafka for tracking data ingestion to Hadoop cluster.
  • Wrote Pig scripts to dedup Kafka hourly data and perform daily roll ups.
  • Data Migration from existing Teradata systems to HDFS and build datasets on top of it.
  • Built a framework using SHELL scripts to automate Hive registration, which does dynamic table creation and automated way to add new partitions to the table.
  • Designed Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Setup and benchmarked Hadoop/HBase clusters for internal use. Developed Simple to complex MapReduce programs.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed Oozie workflows that chain Hive/MapReduce modules for ingesting periodic/hourly input data.
  • Wrote Pig & Hive scripts to analyze the data and detect user patterns.
  • Implemented Device based business logic using Hive UDFs (User Defined Function) to perform ad-hoc queries on structured data.
  • Prepared Avro schema files for generating Hive tables and shell scripts for executing Hadoop commands for single execution.
  • Continuously monitored and managed the Hadoop cluster by using Cloudera Manager.
  • Worked with administration team to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed ETL pipelines to source data to Business intelligence teams to build visualizations.
  • Involved in unit testing, interface testing, system testing and user acceptance testing of the workflow Tool.

Environment: Cloudera Manager, Map Reduce, HDFS, Pig, Hive, Sqoop, Apache Kafka, Oozie, Teradata, Avro, Java (JDK 1.6), Eclipse

Confidential, Houston, TX

Hadoop Developer

Responsibilities:

  • Load and transform data into HDFS from large set of structured data/Oracle/Sql server using Talend Big data studio.
  • Created Impala tables for faster access of data
  • Designed ETL jobs to identify and remove duplicate records using sort and remove duplicate stage and Generated Keys for the unique records using Surrogate key Generator Stage.
  • Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
  • Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop
  • Implemented business logic by writing Pig and Hive UDFs for some aggregative operations and to get the results from them.
  • Working Knowledge in NoSQL Databases like HBase and Cassandra and also experienced in sync up Solr with HBase to compute indexed views for data exploration
  • Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team
  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and to make sure that the correct source table attributes are identified as per Dimensional Data Modeling (Fact Table Attributes and Dimensional Table Attributes)
  • Responsible for complete SDLC management using Agile.
  • Installed and configuredHadoopMap Reduce, HDFS and Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster are installed and configured
  • Cloudera Manager was used to monitor and manage the Hadoop Cluster

Environment: Hadoop, Map Reducer, Cloudera Manager, HDFS, Hive, Pig, HBase, Solr, Talend, Sqoop, Flume, Oozie, UNIX shell scripting, SQL, Java, Avro, Eclipse

Confidential, Naples, FL

Hadoop Developer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive and Map Reduce
  • Loaded data into HDFS and extracted the data from RDBMS into HDFS using Sqoop.
  • Managing and scheduling jobs on a Hadoop cluster.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing
  • Analyzed the data by performing Hive queries and running Pig scripts to know customer behavior
  • Implemented Partitioning and bucketing in Hive.
  • Mentored analyst and test team for writing Hive Queries.
  • Generated final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Extensively used Pig for data cleansing
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Developed the Pig UDFs to pre-process the data for analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS
  • Monitored System health, logs and responded accordingly to any warning or failure conditions
  • Experience in managing and reviewing Hadoop log files

Environment: Cloudera Hadoop, MapReduce, HDFS, Hive, Java, Pig, Linux, XML, Sqoop, Tableau, Java

Confidential, Newark, NY

Java/J2EE Developer

Responsibilities:

  • As part of the lifecycle development prepared class model, sequence model and flow diagrams by analyzing Use cases using Rational Tools
  • Involved in developing Database access components using Spring DAO integrated with Hibernate for accessing the data
  • Extensive use of Struts Framework for Controller components and view components
  • Involved in writing the exception and validation classes using Struts validation rules
  • Involved in writing the validation rules classes for general server side validations for implementing validation rules as part of J2EE design pattern
  • Used Spring AOP and Dependency injection during various modules of project.
  • Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services
  • Spring framework was used for dependency injection and was integrated with different frameworks like Struts, Hibernate
  • Developed various java objects (POJO) as part of persistence classes for OR mapping.
  • Developed web services using SOAP and WSDL with Axis and parsed JSON responses to provide data using the JSON.org libraries
  • Implemented EJB (Message Driven Beans) in the Service Layer
  • Developed, implemented, and maintained an asynchronous, AJAX based rich client for improved customer experience using XML data and XSLT templates.
  • Developed SQL stored procedures and prepared statements for updating and accessing data from database.
  • Used JBoss for deploying various components of application and MAVEN as build tool file for compiling the code of creating WAR files
  • Performed Unit testing and rigorous integration testing of the whole application.

Environment: Java, J2EE, EJB, JMS, Strut, JBoss, Hibernate, JSP, JSTL, AJAX, JavaScript, HTML, XML, MAVEN, SQL, Oracle, SOA, Web Services (SOAP,WSDL), Spring, Windows.

Confidential

Java Developer

Responsibilities:

  • Worked as software developer for Confidential on developing a supply chain management system.
  • The application involved tracking invoices, raw materials and finished products.
  • Gathered user requirements and specifications.
  • Developed the entire application on Eclipse IDE.
  • Developed and programmed the required classes in Java to support the User account module.
  • Used HTML, JSP and jQuery, JavaScript’s library for designing the front end user interface.
  • Implemented error checking/validation on the Java Server Pages using JavaScript.
  • Developed Servlets to handle the requests, perform server side validation and generate result for user.
  • Used JDBC interface to connect to database.
  • Used SQL to access data from Microsoft SQL Server database.
  • Performed User Acceptance Test.
  • Deployed and tested the web application on Web Logic application server.

Environment: JDK 1.4, Servlet 2.3, JSP 1.2, JavaScript, HTML, JDBC 2.1, SQL, MsSQL Server, UNIX and BEA Web Logic Application Server

We'd love your feedback!