We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Nyc New, YorK

PROFESSIONAL SUMMARY:

  • Around 8 years of professional IT experience including 5+ years in Big Data ecosystem related technologies. Expertise in Big Data technologies a consultant, proven capability in project based team work and also as an individual developer with good communications skills.
  • Hands - on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala and Flume.
  • Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, spark, kafka, storm, Zookeeper and Flume
  • Experience with new Hadoop 2.0 architecture YARN and developing YARN Applications on it
  • Experience with Apache Spark’s Core, Spark SQL, Spark Streaming
  • Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses.
  • Firm grip on data modeling, database performance tuning and NoSQL map-reduce systems
  • Experience in managing and reviewing Hadoop log files
  • Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data
  • Worked on multi clustered environment and setting up cloudera Hadoop ecosystem.
  • Worked on Data Virtualization tools like Tableau.
  • Hands on experience in Agile and scrum methodologies
  • Performed importing and exporting data into HDFS and hive using Scoop
  • Experience in processing semi-structured and unstructured datasets
  • Responsible for setting up processes for Hadoop based application design and implementation
  • Experience in managing HBase database and using it to update/modify the data
  • Experience in running MapReduce and Spark jobs over YARN
  • Experience with Cloudera distributions (i.e.) CDH3/CDH4
  • Extending Hive and PIG core functionality by writing UDFs.
  • Used Oozie engine for creating workflow and coordinator jobs that schedule and execute various Hadoop jobs such as MapReduce Jobs, Hive, Pig and sqoop operations.
  • Used Hive to create tables in both delimited text storage format and binary storage format.
  • Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications
  • Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology.
  • Solid experience in writing complex SQL queries. Also, experienced in working with NOSQL databases like Cassandra 2.1.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Hive, Pig, Sqoop, Job Tracker, Task Tracker, Name Node, Data.

TECHNICAL SKILLS:

Big Data: HDFS, MapReduce, Hive, Pig, Zookeeper, Apache Spark, Core, Yarn, Spark SQL and Dataframes, Scala

Utilities: Sqoop, Flume, Kafka, Oozie.

No SQL Databases: Hbase, Cassandra

Languages: C, C++, Java, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala

Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8, Suse Linux

Web Technologies: HTML, DHTML, XML, HTML5, CSS. warehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL

Tools: and IDE: Maven, Toad, Eclipse, NetBeans, Sonar, JDeveloper, DB Visualizer, Tableau

Methodologies: Agile Software Development, waterfall

PROFESSIONAL EXPERIENCE:

Confidential, NYC, New York

Sr. Hadoop Developer

  • Worked on a live 90 nodes Hadoop cluster running CDH4.4
  • Worked with highly unstructured and semi structured data of 90 TB in size (270 TB)
  • Extracted the data from Teradata into HDFS using Sqoop .
  • Worked with Sqoop ( version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Extensive experience in writing Pig ( version 0.10) scripts to transform raw data from several data sources into forming baseline data.
  • Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
  • Experience in using Sequence files, RCFile, AVRO and HAR file formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4 Environment Cluster
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 ( YARN ) setups
  • Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager .
  • Experience with professional software engineering practices for the full software development life cycle including coding standards, source control management control and build processes.
  • Implemented best income logic using Pig scripts and UDFS .
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Experience in reviewing Hadoop log files to detect failures.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Imported data from MySQL server and other relational databases to Apache Hadoop with the help of Apache Sqoop .
  • Creating Hive tables and working on them for data analysis in order to meet the business requirements.
  • Implemented a script to transmit sys print information from Oracle to HBase using Sqoop .
  • Evaluated business requirements and prepared detailed specification’s that follow project guidelines required to develop written programs.
  • Responsible for building scalable distributed data solutions using Hadoop .

Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, Eclipse, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, SOLR.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Worked on a live 90 nodes Hadoop cluster running CDH4.1
  • Worked with highly unstructured and semi structured data of 120 TB in size (360 TB)
  • Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Involved in the setup and deployment of Hadoop cluster .
  • Developed Map Reduce programs for some refined queries on big data.
  • Involved in loading data from UNIX file system to HDFS .
  • Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop .
  • Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
  • Managing and scheduling jobs on a Hadoop cluster using Oozie .
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
  • Installed, Configured Talend ETL on single and multi-server environments.
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
  • Developed Merge jobs in Python to extract and load data into MySQL database.
  • Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements. Developed UNIX shell scripts to call Oracle PL/SQL packages and contributed to standard framework.
  • Developed Simple to complex Map/reduce Jobs using Hive.
  • Implemented Partitioning and bucketing in Hive .
  • Mentored analyst and test team for writing Hive Queries.
  • Involved in setting up of HBase to use HDFS .
  • Extensively used Pig for data cleansing .
  • Setup SOLR and configured multi-Core.
  • Loaded streaming log data from various webservers into HDFS using Flume .
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based
  • Performed benchmarking of the No-SQL databases, Cassandra and Hbase streams .
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark .
  • Supported in setting up QA environment and updating configurations for implementing scripts With Pig and Sqoop.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre­processing with Pig.

Environment: Unix Shell Scripting, Python, Oracle 11g, DB2, HDFS, Kafka, Storm, Spark, ETL, 1Java (jdk1.7), Pig, Linux, Cassandra, MapReduce, Ms Access,Toad,SQL,Scala, MySQL Workbench, XML, No-SQL, MapReduce, SOLR, HBase, Hive,Sqoop, Flume, Talend, Oozie

Confidential, New Jersey

Hadoop Developer

Responsibilities

  • Worked on a live 110 nodes Hadoop cluster running CDH4.0
  • Processed data into HDFS by developing solutions analyzed the data using MapReduce, pig Hive and produce summary results from Hadoop to downstream systems
  • Involved in development of MapReduce job using HiveQL Statements
  • Work closely with various levels of individuals to coordinate and prioritize multiple projects, estimate scope, schedule and track projects throughout SDLC
  • Worked in Hadoop MapReduce, HDFS , Developed multiple MapReduce jobs in java for data cleaning and processing.
  • Involved in MapReduce job using HiveQL uery for data stored in HDFS
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Experienced in managing and reviewing Hadoop Log files .
  • Designed a data warehouse using Hive
  • Handling structured, semi structured and unstructured data
  • Developed simple to complex MapReduce jobs using Hive and Pig .
  • Extensively used pig for data cleansing
  • Created partitioned tables in Hive.
  • Managed and reviewed Hadoop log files.
  • Cluster coordinating services through ZooKeeper.
  • Mentored analyst and test team for writing Hive Queries.
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
  • Developed Simple to complex Map/reduce Jobs using Hive.
  • Implemented Partitioning and bucketing in Hive.
  • Mentored analyst and test team for writing Hive Queries .
  • Involved in setting up of HBase to use HDFS .
  • Supports and assists QA Engineers in understanding, Testing and troubleshooting.
  • Exported and analyzed data to the relational databases using sqoop for visualization and to generate reports for the BI team
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and Pre­processing with Pig.
  • Involved in the database migrations to transfers to migrations to transfer data from one database to other and other and complete virtualization of many client application.
  • Developed the pig UDF’S to pre-process the data for analysis.
  • Created HBase tables to store variable data formats of data coming from different portfolios.
  • Used sqoop widely in order to import data from various systems/sources (like MYSQL) into HDFS

Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Teradata, MySQL, Shell Scripting, Kafka, Cassandra.

Confidential, Houston, Texas.

Hadoop Developer

Responsibilities:

  • Designed and implemented partitioning buckets in Hive.
  • Developed pig scripts to convert the data from Avro to Textfile format .
  • Designed and Developed Read Lock Capabilities In HDFS
  • Involved in end-to-end implementation of ETL logic
  • Used Hive to process data and Batch data filtering. Used Spark for any other value centric data filtering.
  • Worked extensively with Flume for importing data from various webservers to HDFS.
  • Worked on Large-scale Hadoop YARN cluster for distributed data processing and analysis using Sqoop, Pig, Hive, Impala and NoSQL databases.
  • Develop Hadoop data processes using Hive and/or Impala .
  • Designed and implemented Spark test bench application to evaluate quality of recommendations made by the engine.
  • Monitored and identified performance bottlenecks in ETL code. Worked on data utilizing a Hadoop and Zookeeper , aiding in the development of specialized indexes for performant queries on big data implementations.
  • Worked on deploying Hadoop cluster with multiple nodes and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Got good experience with NoSQL database.

Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Python, Big Data, Oozie, Sqoop, Scala, Kafka, Flume, Impala, Zookeeper, MongoDB, MapReduce, Cassandra, Scala, Linux, SOLR, XML, Maven, NoSQL, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.

Confidential, Cincinnati, OH

Java Developer

Responsibilities:

  • Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio.
  • Designed and developed Optimization UI screens for Rate Structure, Operating Cost, Temperature and Predicted loads using JSF myfaces, JSP, JavaScript and HTML.
  • Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
  • Developed JSP web pages for rate Structure and Operating cost using JSF HTML and JSF CORE tags library.
  • Designed and developed the framework for the IMAT application implementing all the six phases of JSF life cycle and wrote Ant build, deployment scripts to package and deploy on JBoss application server.
  • Designed and developed Simulated annealing algorithm to generate random Optimization schedules and developed neural networks for the CHP system using Session Beans.
  • Integrated EJB 3.0 with JSF and managed application state management, business process management (BPM) using JBoss Seam.
  • Used Test driven approach for developing the application and Implemented the unit tests using Python Unit test framework.
  • Wrote AngularJS controllers, views, and services for new website features.
  • Developed Cost function to calculate the total cost for each CHP Optimization schedule generated by the Simulated Annealing algorithm using EJBs.
  • Implemented spring web flow for the Diagnostics Module to define page flows with actions and views and created POJOs and used annotations to map them to SQL Server database using EJB.
  • Developed entire frontend and backend modules using Python on Django including Tasty pie Web Framework using Git.
  • Wrote DAO classes, EJB 3.0 QL queries for Optimization schedule and CHP data retrievals from SQL Server database .
  • Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking
  • Created combined deployment descriptors using XML for all the session and entity beans.
  • Wrote JSF and JavaScript validations to validate data on the UI for Optimization and Diagnostics and Developed Web Services to have access to the external system (WCC) for the diagnostics module.
  • Developed entire frontend and backend modules using Python on Django Web Framework on MySQL.
  • Wrote Message Driven Bean to implement the Diagnostic Engine and configured the JMS queue details and involved in performance tuning of the application using JProbe and JProfiler .
  • Designed and coded application components in an Agile environment utilizing a test driven development approach.
  • Skilled in test driven development and Agile development .
  • Created technical design document for the Diagnostics Module and Optimization module covering Cost function and Simulated Annealing approach.
  • Involved in code reviews and performed version guidelines.

Environment: Java 1.5, J2EE, Microsoft Vision, Python, EJB 3.0, JSP, JSF, JBoss Seam, JIRA, Webservices, JMS, JavaScript, AngularJs, HTML, ANT, Agile, JUnit, JBoss 4.2.2, MS SQL Server 2005, My ECLIPSE 6.0.1.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in various client implementations.
  • Development of Spring Services
  • Development of persistence classes using Hibernate framework .
  • Development of SOA services using Apache Axis web service framework .
  • Development of user interface using Apache Struts2.0, JSPs, Servlets, JQuery and Java Script.
  • Developed client functionality using ExtJS.
  • Development of JUnit test cases to test business components.
  • Extensively used Java Collection API to improve application quality and performance.
  • Vastly used Java 5 features like Generics, enhanced for loop, type safe etc.
  • Providing production support and enhancements design to the existing product.

Environments: Java 1.5, SOA, Spring, ExtJS, Struts 2.0, Servlets, JSP, GWT, JQuery, JavaScript, CSS, Web Services, XML, Oracle, Web logic Application Server, Eclipse, UML, Microsoft Vision.

We'd love your feedback!