We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Foster City, CA

PROFESSIONAL SUMMARY:

  • 7+ years of extensive experience in Full life cycle of Hadoop Development, Java Application development including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements.
  • 5+ years of comprehensive experience as a Hadoop Developer.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, YARN, HDFS, HBase, Zoo Keeper, Oozie, Hive, Cassandra, Sqoop, Pig, Flume.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Node Manager, Resource Manager and MapReduce.
  • Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
  • Experience in working with stream processing systems like Apache Spark.
  • Organizing data into tables, performing transformations, and simplifying complex queries with Hive.
  • Performing real - time interactive analyses on massive data sets stored in HDFS or HBase using SQL with Impala.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
  • Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
  • Worked on messaging systems like Kafka along with spark stream processing to perform real time analysis.
  • Worked on integrating Apache Ignite and Kafka for Highly Scalable and Reliable Data Processing
  • Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
  • Developed UDFs in Java as and when necessary to use in Pig and Hive queries.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experience with Jakarta Struts Framework, MVC and J2EE Framework.
  • Experience in implementing J2EE Design Patterns.
  • Experienced in process, validate, parse, and extract data from XML files using DOM and SAX parsers.
  • Hands on experience in IDE tools like Eclipse, Visual Studio.
  • Worked extensively in Java, XML, XSL, EJB, JSP, JDBC, MVC, JSTL, Design Patterns and UML.
  • Having strong Object Oriented Design experience.
  • Hands on experience in Web Application Development using Client Script design technologies like Angular JS, JQuery as well as HTML, CSS, XML, Java Script
  • Experienced in developing Web services in Tomcat and Jboss.
  • Build and deployment tools like ANT, MAVEN 2, bug fixing and maintenance
  • Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server.
  • Worked with business users to extract clear requirements to create business value.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Excellent problem solving skills, high analytical skills, good communication and interpersonal skills.

TECHNICAL SKILLS:-

Hadoop/Big Data Ecosystem: HDFS, MapReduce, YARN, HBase, Pig, Hive, Sqoop, Oozie, SparkScala, Ignite, Kafka, Zoo Keeper, Impala, Cassandra, Sqoop, Flume.

Java Technologies: Core Java, Servlets, JSP, JDBC, Collections

Web Technologies: JSP, JavaScript, AJAX, XML, DHTML, HTML, CSS, SOAP, WSDL, Web Services

Frame Works: Struts 2.x, Hibernate, Spring

Databases: My SQL, Oracle, SQL Server, MS Access, Elasticsearch

IDE: Eclipse, Visual Studio

Logging Tool: Log4j

Build Tools: ANT, MAVEN, GitHub, Gradle

Web Application Servers: Oracle Application Server, Apache Tomcat, Web Logic

Version Control System: Clear Case, Confidential

Operating Systems: Windows XP/7/8/10, Unix, Red hat Linux

Concepts: OOAD, UML, Design Patterns, Waterfall & Agile Methodology

PROFESSIONAL EXPERIENCE:-

Confidential, Foster City, CA

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop stack using different big data analytic tools including Pig and Hive, HBase and Sqoop.
  • Worked on MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Imported data using Sqoop from various sources like CyberSource, Informatica, Salesforce, Teradata, Authorize.net and Genesys Info Mart to load data into HDFS on a daily basis.
  • Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics.
  • Handled importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.
  • Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
  • Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
  • Worked as support team member to improve the fraud detection system built using Kafka and Spark.
  • Provided support to MapReduce Programs by Cluster monitoring, maintenance and troubleshooting.
  • Worked on the backend using Scala and Spark to perform several aggregation logics.
  • Worked on real-time analytics and transactional processing using Ignite integrated with Kafka streams.
  • Worked hands on with ETL process and Involved in the development of the Hive/Impala queries.
  • Experience in using Sequence files, RCFile, Avro and Parquet file formats.
  • Worked on implementing cluster coordination services through Zookeeper.
  • Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend. Checking & fixing delimiter in ASCII Files.
  • Perform data conversion, data integration and create mappings from source to target using Talend.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
  • Efficiently handled periodic exporting of SQL data into Elasticsearch.
  • Efficiently handled collection of data from various sources and profiling them using various analyst tools like Informatica Data Quality.
  • Involved with the design and implementation of the near real-time indexing pipeline, including index-management, cluster maintenance and interacting with Elasticsearch.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked using GitHub as a code repository and Gradle as a build tool
  • Used Hive to process data and Batch data filtering. Used Spark for other value centric data filtering.

Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Java, Oozie, Sqoop, Scala, Spark, Flume, Impala, Zookeeper, Ignite, Kafka, MapReduce, Cloudera Manager, Cassandra, Elasticsearch Talend Big Data Studio, Avro, Parquet, Eclipse, My SQL, Gradle, Teradata, CyberSource, Informatica (IDQ), Salesforce, Authorize.net and Genesys Info Mart.

Confidential, RI

Hadoop Developer

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in installing Hadoop Ecosystem components.
  • Used to manage and review the Hadoop log files.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
  • Implemented MapReduce jobs by writing UDF's using Java API & Pig Latin.
  • Worked on pushing data as delimited files into HDFS using Talend Big data studio.
  • Worked on different Talend Hadoop Component like Hive, Pig.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Develop ETL job in Talend to load data from ASCII & Flat files.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL.
  • Worked on filtering raw data using tools like Tableau.
  • Responsible for running Hadoop streaming jobs to process csv data.
  • Experience in Gradle Build tool and understanding the artifactory and repo structure.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, HBase, Pig, Linux, Sqoop, Oozie, Flume, Talend, Tableau, Maven, GitHub, Gradle, XML, MySQL, MySQL Workbench.

Confidential, TN

Hadoop Developer

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Created Hive tables which are extracted by the relevant EDW tables.
  • Responsible to manage data coming from different sources.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Supported Map Reduce Programs those are running on the cluster.
  • Worked extensively on Hive and PIG.
  • Written Hive queries for data analysis to meet the business requirements.
  • Involved in creating UDF's where Custom Functionalities are required.
  • Wrote MapReduce job using Java API.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Responsible to manage data coming from different sources.

Environment: Hadoop, MapReduce, HDFS, Eclipse, Omniture, Hive, PIG, HBase, Sqoop, Oozie and SQL.

Confidential, CA

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop stack using different big data analytic tools including Pig and Hive, HBase and Sqoop.
  • Worked on MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Imported data using Sqoop from various sources like CyberSource, Informatica, Salesforce, Teradata, Authorize.net and Genesys Info Mart to load data into HDFS on a daily basis.
  • Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics.
  • Handled importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.
  • Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
  • Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
  • Worked as support team member to improve the fraud detection system built using Kafka and Spark.
  • Provided support to MapReduce Programs by Cluster monitoring, maintenance and troubleshooting.
  • Worked on the backend using Scala and Spark to perform several aggregation logics.
  • Worked on real-time analytics and transactional processing using Ignite integrated with Kafka streams.
  • Worked hands on with ETL process and Involved in the development of the Hive/Impala queries.
  • Experience in using Sequence files, RCFile, Avro and Parquet file formats.
  • Worked on implementing cluster coordination services through Zookeeper.
  • Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend. Checking & fixing delimiter in ASCII Files.
  • Perform data conversion, data integration and create mappings from source to target using Talend.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
  • Efficiently handled periodic exporting of SQL data into Elasticsearch.
  • Efficiently handled collection of data from various sources and profiling them using various analyst tools like Informatica Data Quality.
  • Involved with the design and implementation of the near real-time indexing pipeline, including index-management, cluster maintenance and interacting with Elasticsearch.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked using GitHub as a code repository and Gradle as a build tool
  • Used Hive to process data and Batch data filtering. Used Spark for other value centric data filtering.

Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Java, Oozie, Sqoop, Scala, Spark, Flume, Impala, Zookeeper, Ignite, Kafka, MapReduce, Cloudera Manager, Cassandra, Elasticsearch Talend Big Data Studio, Avro, Parquet, Eclipse, My SQL, Gradle, Teradata, CyberSource, Informatica (IDQ), Salesforce, Authorize.net and Genesys Info Mart.

Confidential, CT

Software Developer

Responsibilities:

  • Worked on analyzing Hadoop stack using different big data analytic tools including Pig and Hive, HBase and Sqoop.
  • Worked on MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Imported data using Sqoop from various sources like CyberSource, Informatica, Salesforce, Teradata, Authorize.net and Genesys Info Mart to load data into HDFS on a daily basis.
  • Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics.
  • Handled importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.
  • Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
  • Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
  • Worked as support team member to improve the fraud detection system built using Kafka and Spark.
  • Provided support to MapReduce Programs by Cluster monitoring, maintenance and troubleshooting.
  • Worked on the backend using Scala and Spark to perform several aggregation logics.
  • Worked on real-time analytics and transactional processing using Ignite integrated with Kafka streams.
  • Worked hands on with ETL process and Involved in the development of the Hive/Impala queries.
  • Experience in using Sequence files, RCFile, Avro and Parquet file formats.
  • Worked on implementing cluster coordination services through Zookeeper.
  • Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend. Checking & fixing delimiter in ASCII Files.
  • Perform data conversion, data integration and create mappings from source to target using Talend.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
  • Efficiently handled periodic exporting of SQL data into Elasticsearch.
  • Efficiently handled collection of data from various sources and profiling them using various analyst tools like Informatica Data Quality.
  • Involved with the design and implementation of the near real-time indexing pipeline, including index-management, cluster maintenance and interacting with Elasticsearch.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked using GitHub as a code repository and Gradle as a build tool
  • Used Hive to process data and Batch data filtering. Used Spark for other value centric data filtering.

Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Java, Oozie, Sqoop, Scala, Spark, Flume, Impala, Zookeeper, Ignite, Kafka, MapReduce, Cloudera Manager, Cassandra, Elasticsearch Talend Big Data Studio, Avro, Parquet, Eclipse, My SQL, Gradle, Teradata, CyberSource, Informatica (IDQ), Salesforce, Authorize.net and Genesys Info Mart.

We'd love your feedback!