We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Baltimore, MD

SUMMARY

  • 6 years of extensive experience in Information Technology including 3+ years in Bigdata technologies and Hadoop ecosystem related technologies.
  • Proficiency in Hadoop ecosystems like Map Reduce, Spark, HDFS, Apache HBase, Oozie, Hive, Impala, Sqoop, Pig, Kafka, Flume, Cassandra, Hue and Zoo Keeper.
  • Experience in writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
  • Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to RDBMS and vice - versa.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and Map-Reduce concepts.
  • Experience in developingcustom UDFsfor Pig and Hive to in corporate methods and functionality of Python/Java
  • Very Good knowledge on Hadoop, YRAN Architecture, HDFS File system and Streaming API along with Data warehousing Concepts.
  • Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Experienced on major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager.
  • Hands on experience in designing and creating Hive tables using shared meta-store with partitioning and bucketing.
  • Experience in developing application inPythonlanguage for multiple platforms design and maintain database usingPython.
  • Good understanding and hands on experience in setting up and maintaining NoSQL Databases like Cassandra, MongoDB, HBase.
  • Developed Spark applications using Python on different data formats.
  • Experience in Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Expertise in writing Shell scripting in UNIX using ksh and bash.
  • Experience in database design, creating complexSQL Queriesand tuning writingPL/SQL blocks likestored procedures, Functions, Cursors, Index, triggersandpackages in Oracle and MongoDB on Unix/Linux.
  • Experience in client side Technologies and validations such as HTML, CSS, JavaScript, AJAX, jQuery, JSON.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
  • Experienced in all facets of Software Development Life Cycle using Waterfall and Agile methodologies.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.

TECHNICAL SKILLS

Hadoop Ecosystem: MapReduce, HDFS, Spark, Flume, Sqoop, Hive, Pig, Oozie, Impala, HBase, Hue, Cassandra, Kafka, Zookeeper.

Big data Frameworks: HDFS, YARN, Spark.

Hadoop Distributions: Cloudera (CDH3, CDH4, CDH5), Hortonworks.

Virtual machines: VMware, Virtual box.

NoSQL Databases: HBase, Cassandra, MongoDB.

Programming Languages: C, C++, C#, Java, PHP, PL/SQL, Python, Unix, Shell Scripting.

Tools: Eclipse, Netbeans, JUnit Frame Work, Clear DDTS, MH Web, Hansoft.

Operating Systems: Linux, UNIX and Windows.

Database: Oracle 10g/11g, T-SQL, MySQL, MS SQL Server.

Web Technologies: HTML, CSS, AngularJS, jQuery, DHTML, XML, XSLT, AJAX, JavaScript.

Messaging Services: ActiveMQ, Kafka.

SDLC methodologies: Agile, Waterfall.

PROFESSIONAL EXPERIENCE

Confidential, Baltimore, MD

Hadoop Developer

Responsibilities:

  • Extracted data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and imported data from MySQL into HDFS using Sqoop.
  • Involved in running batch processes using Pig Scripts and developed Pig UDFs to pre-process the data and data manipulation.
  • Developed data pipeline using Flume, Sqoop, Pig, MapReduce and Spark to ingest behavioral data into HDFS for analysis.
  • Loading the data from different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
  • Used Pig to process logs and semi structured content and imported the processed data into Hive warehouse which enables the Business Analysts to perform analytics.
  • Involved in handling Hive queries using Spark SQL that integrate Spark environment.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Load the data intoSparkRDD and do in memory data Computation to generate the Output response.
  • Developed Spark applications using Python on different data formats like Text file, CSV file.
  • Developed Hive queries to pre-process the data for analysis by imposing read only structure on the stream data.
  • Integrated Hive and HBase using mutual tables for efficient data access. Written multiple Hive UDFs for complex queries.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Designed and implemented Incremental Imports into Hive tables. Created tables in Hive to store the captured data and analyzed it using HQL.
  • Extracted the data from different sources into HDFS and Bulk Loadedthe cleaned data intoHBase.
  • Exported the web log analysed data using HQL to the RDBMS using Sqoop for visualization and to generate reports for the BI team.
  • Created Hive tables, implemented Partitioning, Dynamic Partitioning and Bucketing for efficient data access.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Given POC ofFLUMEto handle the real time log processing for attribution reports.
  • Experienced in running query usingImpalaand used BI tools to run ad-hoc queries directly on HDFS.
  • Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the Map reduce jobs that extract the data on a timely manner.
  • Developed automated scripts using Unix Shell for scheduling and automation of tasks.

Environment: Hadoop, Hive, Spark, HBase, MapReduce, HDFS, Kafka, Sqoop, Python, Java (JDK 1.6),HadoopDistribution of Horton Works, Cloudera, Data tax, IBM DataStage 8.1, Oracle 11g/10g, PL/SQL, SQL*PLUS and Linux.

Confidential, Bethlehem, PA.

Hadoop Developer

Responsibilities:

  • Installation, monitoring, managing, troubleshooting, applying patches of various Hadoop Ecosystems and Hadoop Daemons in different environments such as Development Cluster, Test Cluster and Production environments.
  • Developed MapReduce jobs on a multi Peta byte YARN and Hadoop clusters which processes billions of events every day, to generate daily and monthly reports.
  • Developed data pipeline using Pig and Hive from various data sources. These pipelines had customized UDF’S to extend the ETL functionality.
  • Developed Java and Pig scripts to arrange incoming data into suitable, structured format before piping it out for analysis.
  • Extracted the data from RDBMS into HDFS usingSqoop.
  • Developed automated scripts using Unix Shell for running Balancer, file system health check and User/Group creation on HDFS.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Created workflow and coordinator using Oozie for regular jobs and to automate the tasks of loading the data into HDFS.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed Spark application using Python on different data formats for processing and analysis.
  • Wrote extensive MapReduce jobs in Java to train the cluster and developed Java MapReduce programs for the analysis of sample log files stored in cluster.
  • Developed Java code components for connecting and querying the Cassandra database and designed the control table schema which is the driving force for the whole application.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive querying
  • Created Reports and Dashboards using structured and unstructured data.
  • Wrote Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
  • Delivered tuned, efficient and error free codes for new Big Data requirements using my technical knowledge in Hadoop and its Eco-system.
  • Analyzed Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.

Environment: Apache Cassandra, HDFS, HBase, MapReduce, Hive, PIG, Sqoop, Spark, Hadoop, YARN, Cloudera Manager, Redhat Linux, Cent OS, Java, NoSQL, Kafka, Python, Perl, Cloudera Navigator, Java 1.6, IBM AIX 6.1, UNIX, Shell Scripting, XML, XSLT.

Confidential - Mountain View, CA.

Hadoop Developer

Responsibilities:

  • Involved in all phases of development activities from requirements collection to production support.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Used Sqoop to dump data from RDBMS into HDFS for processing and analysis.
  • Built a data flow pipeline using flume, java map reduce and pig.
  • Used Java Map Reduce and Pig scripts to process the data and store the data on HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Configured MySQL database to store Hive metadata
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Assisted in designing, development and architecture of Hadoop and HBase systems.
  • Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
  • Tested Apache(TM) Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Involved in loading data fromUNIXfile system toHDFS.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Developed automated test tools using JUnit.
  • Created HBase tables to store variable data formats of data coming from different portfolios.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Used Impala to query the Hadoop data stored in HDFS.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Environment: Apache Hadoop, Java (jdk1.6), Bash, Kafka, Impala, Horton works, Deployment tools, Python, Oracle 11g/10g, MySql, Toad, JUnit, Window NT, UNIX, Sqoop, Hive, Oozie, PIG.

Confidential

Java Developer

Responsibilities:

  • Involved in Software Development Life Cycle (SDLC) of the application: Requirement gathering, Design Analysis and Code development.
  • Developed the application using JSF 2.0 that leverages classical Model View Layer (MVC) architecture.
  • Implemented JSF MVC module for developing the Controller, views and cleaner front end code and created Physical Application Design too.
  • Involved in designing front-end screens using JavaScript, JQuery, JSP, AJAX, HTML and DHTML and developed the Device Direct module for communication with the cell relays
  • Extensively used Stored Procedures for performance and involved in source code management using IBM ClearCase.
  • Used Putty for telnet connection to cell relay, Winscp for deploying the server code.
  • Used Apache Ant for the build process and used ClearCase for version control and ClearQuest for bug tracking.
  • Used log4J for logging and was intensively involved in defect fixing and also implemented the best practices for defect fixing.

Environment: J2SE, JSF, JSP, Oracle 10g, IBM RAD IDE, JDBC, Clear Case, ANT, WebSphere, Log4j, Windows.

Confidential

Jr. Java Developer

Responsibilities:

  • Understanding functional specifications, designing and developed and designed the front end using HTML, CSS and JavaScript with JSF Ajax and tag libraries.
  • Designed and developed Hibernate configuration and session-per-request design pattern for making database connectivity and accessing the session for database transactions respectively.
  • Designing and developing front-end, middleware and back-end applications.
  • Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle.
  • Used ClearCase for version control and ClearQuest for bug tracking.
  • Used SQL for fetching and storing data in databases and designed and developed the Validators, Controller Classes and Java bean components.
  • Involved in the configuration management using ClearCase.

Environment: Java, J2EE, Struts, IBatis, XML, JSP, CSS, HTML, Javascript, JQuery, Oracle10g, DB2, Unix, RAD, Clear case, WebSphere, Servlets, Hibernate, Eclipse, ClearCase, ClearQuest.

We'd love your feedback!