We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Plano, TX

SUMMARY:

  • 7+ years of professional experience in IT in Analysis, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications.
  • Qualified Hadoop developer with experience in Hadoop, database management system architecture, Java core, Testing and Implementing Big Data.
  • Good experience in developing and implementing big data solutions and data mining applications on Hadoop using HDFS, Map Reduce, Hbase, Pig, Hive, Sqoop, Flume, Kafka, Strom, Spark, Oozie, Zookeeper.
  • Strong Experience in analyzing data using HiveQL, Pig Scripts and custom Map Reduce programs in Java.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
  • Experience in importing and exporting the data using Sqoop and Flume from HDFS to Relational Database System and vice - versa
  • Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
  • Proficient in processing data by using TEZ job.
  • Expertise in Real time data ingestion into HBASE and HIVE using Storm.
  • Expertise in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Good experience in loading unstructured data into HDFS using Flume/Kafka.
  • Excellent experience in dealing with Compression Codecs like Snappy, Gzip.
  • Expertise in managing and reviewing Hadoop Log files.
  • Hands on experience in in-memory data processing with Apache Spark.
  • Hands on experience in data cleaning, transformation and pushing data as delimited files into HDFS using Informatica Developer.
  • Worked on ETL tools like Talend to extract, transform and load data according to the requirement.
  • Extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
  • Implemented ELK (ElasticSearch, Logstash, Kibana) stack to collect and analyze the logs produced by the Storm cluster
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Excellent communication skills, interpersonal skills, problem solving skills, a very good team player along with extremely strong positive attitude.

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop 1.x/2.x (Yarn), HDFS, Map Reduce, Spark, Hive, Zookeeper, Oozie, Tez, Pig, Sqoop, Flume, Kafka, Storm, Ganglia, Nagios.

Development Tools: Eclipse, IBM DB2 Command Editor, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access).

Programming/Scripting Languages: Java, SQL, Unix Shell Scripting, Python.

Databases: Oracle 11g,10g,9i, MySQL, PL/SQL, SQL Server 2005,2008 & DB2

NoSQL Databases: HBase, Cassandra, Mongo DB

ETL: Informatica, Talend

Web Tools: HTML, JavaScript, XML,XSL,DOM

Methodologies: Agile/ Scrum, Waterfall

Operating Systems: Windows 98/2000/XP/Vista/7/8, 10, Macintosh, Unix, Linux and Solaris.

Monitoring & Reporting Tools: Ganglia, Nagios, Custom shell reports

PROFESSIONAL EXPERIENCE:

Confidential, Plano, TX

Hadoop Developer

Responsibilities:

  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Developing MapReduce programs to parse the raw data and store the refined data in tables.
  • Injecting, analyzing, processing the data and storing results into HDFS, Hive/HBase using Sqoop.
  • Responsible for managing data from various sources and their metadata using Hive .
  • Working with Hive for partitioning, bucketing of data to improve the performance of data from different kind of data sources.
  • Involved in extracting data from various data sources into HDFS . Used Sqoop to efficiently transfer data between RDBMS and HDFS, used Flume to stream log data from servers.
  • Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Altered existing Scala programs to enhance performance and obtain partitioned results.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developing Spark code in Scala and Spark-SQL environment for faster testing and processing of data.
  • Exporting analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
  • Involved in loading data into Cassandra NoSQL Database
  • Working with Oozie to automate the flow of jobs and coordination in the cluster respectively.

Environment: Hadoop 0.20.2, Hive, Hbase, Apache Sqoop, Scala, PIG, Spark, Oozie, Cassandra,Cloudera manager.

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Worked on Large-scale Hadoop cluster for distributed data processing and analysis using Sqoop, Hive, Pig and MapReduce .
  • Imported data to HDFS from different databases and exported the processed data to Hive, HBase and RDBMS using Sqoop .
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
  • Optimized MapReduce algorithms using Combiners and Partitions to ensure best results.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS .
  • Loading data in to NoSQL database HBase using Pig .
  • Developed a robust data-pipeline to cleanse, filter, aggregate, normalize, and de-normalize the data using Apache Pig and Spark.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
  • Coordinated the cluster services using ZooKeeper .
  • Developing workflow in Oozie to automate the tasks of loading the data into HDFS .
  • Actively participated in collection, analysis and design of the requirements to meet the clients criteria.
  • Maintained System integrity of all subcomponents (primarily HDFS, Map Reduce, HBase, and Flume).
  • Documented all the requirements, code and implementation methodologies for reviewing and analyzation purposes.

Environment: HDFS, Hive, Pig, Sqoop, Spark, ZooKeeper, Oozie

Confidential, Phoenix, AZ

Hadoop Developer

Responsibilities:

  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed MapReduce programs that filter bad and un­necessary claim records and find out unique records based on account type.
  • Analyzed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL, Splunk and Spark streaming.
  • Used Sqoop to import data into HDFS from MySQL database and vice-versa.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Hbase .
  • Extensive experience in writing Pig scripts to transform raw data from several big data sources into forming baseline big data.
  • Configured Flume to extract the data from the web server output files to load into HDFS .
  • Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.
  • Involved in creating Hive Tables, loading the data and writing Hive Queries that will run internally in a map reduce way.
  • Responsible for importing and exporting data into HDFS from Oracle Database, and vice versa using Sqoop .
  • Extensively worked with Partitions, Bucketing tables in Hive and designed both Managed and External table.
  • Created and worked with Sqoop jobs with full refresh and incremental load to populate Hive External tables.
  • Designing and creating Oozie workflows to schedule and manage Hadoop, Hive, pig and sqoop jobs.

Environment: Hadoop, MapReduce, Pig, Hive, Spark, Splunk, Hbase, HDFS, MySQL, Sqoop, Flume, Oozie.

Confidential, Nashville, TN

ETL Developer

Responsibilities:

  • Understanding the design Requirements.
  • Analyzed business process workflows and assisted in the development of ETL procedures for moving data from source to target systems.
  • Extensively used ETL to data transfer from different sources like flat files, .csv, XML, VSAM and load the data into the target staging database.
  • Designed and implemented appropriate ETL mappings to extract and transform data from various sources to meet requirements.
  • Extensively used Informatica Transformations like Source Qualifier, Rank, SQL, Router, Filter, Lookup, Joiner, Aggregator, Normalizer, Sorter etc. and all transformation properties.
  • Created Sessions, Workflows, Post Session email task and also performed various workflow monitoring and scheduling tasks.
  • Used Informatica Designer to create reusable transformations to be used in Informatica mappings and mapplets.
  • Developed slowly changing dimension, according to the data mart schemas.
  • Involved in identifying the sources for various dimensions and facts for different data marts according to star schema design pattern.
  • Involved in Fine-tuning of sources, targets, mappings and sessions for Performance Optimization.
  • Monitored sessions using the workflow monitor, which were scheduled, running, completed or failed. Debugged mappings for failed sessions.

Environment: Informatica Power Center 8.5/8.6.1, Oracle10g, Windows.

Confidential, Atlanta, GA

Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
  • Designed and developed the Application based on J2EE Architecture for server side on Spring MVC Framework.
  • Involved in analysis, design and developing front end/UI using JSP, HTML, DHTML and JavaScript.
  • Prepared workflow diagrams using MS VISIO and modeled the methods based on OOPS methodology.
  • Data Migration from Flat files, CSV, MS-Access, Excel and OLE DB to SQL Database.
  • Accountable to guide projects on the design and execution of data quality initiatives and other data performance measures following data quality programs.
  • Developed the Host modules using C++, DB2 and SQL .
  • Responsible for creating the front-end code and java code to suit the business requirement.
  • Installed, configured and administered Web Logic Application Server and deploy JSP, Servlets and EJB applications.
  • Written Maven scripts for build, unit testing, deployment, check styles etc.

Environment: Java, J2EE, JDK, JSP, Eclipse, Maven, HTML, Servlets, SQL, DB2.

Confidential

Java Developer

Responsibilities:

  • Developed front end screens which includes JQuery, JavaScript, Java and CSS.
  • Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
  • Designed and developed the Application based on J2EE Architecture for server side on Spring MVC Framework.
  • Involved in development and enhancement of web client. Involved in enhancements and optimization in Business logic.
  • Developed web-based user interfaces using struts frame work.
  • Designed the GUI screens using Struts and Configured log4j to debug the Application.
  • Involved in the development of test cases for the testing phase.
  • Performed End to end integration testing of online scenarios and unit testing using JUnit Testing Framework.

Environment: Java, J2EE, JavaScript, JSP, JSF, Oracle, Eclipse, Log4j

We'd love your feedback!