Hadoop Developer Resume
Foster City, CA
PROFESSIONAL SUMMARY:
- 7+ years of extensive experience in Full life cycle of Hadoop Development, Java Application development including requirements Analysis and Design, Development, implementation, support, maintenance and enhancements.
- 5+ years of comprehensive experience as a Hadoop Developer.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, YARN, HDFS, HBase, Zoo Keeper, Oozie, Hive, Cassandra, Sqoop, Pig, Flume.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Node Manager, Resource Manager and MapReduce.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in working with stream processing systems like Apache Spark.
- Organizing data into tables, performing transformations, and simplifying complex queries with Hive.
- Performing real - time interactive analyses on massive data sets stored in HDFS or HBase using SQL with Impala.
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Worked on messaging systems like Kafka along with spark stream processing to perform real time analysis.
- Worked on integrating Apache Ignite and Kafka for Highly Scalable and Reliable Data Processing
- Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Extracted the data from Teradata into HDFS using Sqoop.
- Created and worked on Sqoop jobs with incremental load to populate Hive External tables.
- Developed UDFs in Java as and when necessary to use in Pig and Hive queries.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Experience with Jakarta Struts Framework, MVC and J2EE Framework.
- Experience in implementing J2EE Design Patterns.
- Experienced in process, validate, parse, and extract data from XML files using DOM and SAX parsers.
- Hands on experience in IDE tools like Eclipse, Visual Studio.
- Worked extensively in Java, XML, XSL, EJB, JSP, JDBC, MVC, JSTL, Design Patterns and UML.
- Having strong Object Oriented Design experience.
- Hands on experience in Web Application Development using Client Script design technologies like Angular JS, JQuery as well as HTML, CSS, XML, Java Script
- Experienced in developing Web services in Tomcat and Jboss.
- Build and deployment tools like ANT, MAVEN 2, bug fixing and maintenance
- Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server.
- Worked with business users to extract clear requirements to create business value.
- Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
- Excellent problem solving skills, high analytical skills, good communication and interpersonal skills.
TECHNICAL SKILLS:-
Hadoop/Big Data Ecosystem: HDFS, MapReduce, YARN, HBase, Pig, Hive, Sqoop, Oozie, SparkScala, Ignite, Kafka, Zoo Keeper, Impala, Cassandra, Sqoop, Flume.
Java Technologies: Core Java, Servlets, JSP, JDBC, Collections
Web Technologies: JSP, JavaScript, AJAX, XML, DHTML, HTML, CSS, SOAP, WSDL, Web Services
Frame Works: Struts 2.x, Hibernate, Spring
Databases: My SQL, Oracle, SQL Server, MS Access, Elasticsearch
IDE: Eclipse, Visual Studio
Logging Tool: Log4j
Build Tools: ANT, MAVEN, GitHub, Gradle
Web Application Servers: Oracle Application Server, Apache Tomcat, Web Logic
Version Control System: Clear Case, Confidential
Operating Systems: Windows XP/7/8/10, Unix, Red hat Linux
Concepts: OOAD, UML, Design Patterns, Waterfall & Agile Methodology
PROFESSIONAL EXPERIENCE:-
Confidential, Foster City, CA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop stack using different big data analytic tools including Pig and Hive, HBase and Sqoop.
- Worked on MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV & other compressed file formats.
- Imported data using Sqoop from various sources like CyberSource, Informatica, Salesforce, Teradata, Authorize.net and Genesys Info Mart to load data into HDFS on a daily basis.
- Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics.
- Handled importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.
- Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
- Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
- Worked as support team member to improve the fraud detection system built using Kafka and Spark.
- Provided support to MapReduce Programs by Cluster monitoring, maintenance and troubleshooting.
- Worked on the backend using Scala and Spark to perform several aggregation logics.
- Worked on real-time analytics and transactional processing using Ignite integrated with Kafka streams.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala queries.
- Experience in using Sequence files, RCFile, Avro and Parquet file formats.
- Worked on implementing cluster coordination services through Zookeeper.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend. Checking & fixing delimiter in ASCII Files.
- Perform data conversion, data integration and create mappings from source to target using Talend.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
- Efficiently handled periodic exporting of SQL data into Elasticsearch.
- Efficiently handled collection of data from various sources and profiling them using various analyst tools like Informatica Data Quality.
- Involved with the design and implementation of the near real-time indexing pipeline, including index-management, cluster maintenance and interacting with Elasticsearch.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked using GitHub as a code repository and Gradle as a build tool
- Used Hive to process data and Batch data filtering. Used Spark for other value centric data filtering.
Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Java, Oozie, Sqoop, Scala, Spark, Flume, Impala, Zookeeper, Ignite, Kafka, MapReduce, Cloudera Manager, Cassandra, Elasticsearch Talend Big Data Studio, Avro, Parquet, Eclipse, My SQL, Gradle, Teradata, CyberSource, Informatica (IDQ), Salesforce, Authorize.net and Genesys Info Mart.
Confidential, RI
Hadoop Developer
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components.
- Used to manage and review the Hadoop log files.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
- Implemented MapReduce jobs by writing UDF's using Java API & Pig Latin.
- Worked on pushing data as delimited files into HDFS using Talend Big data studio.
- Worked on different Talend Hadoop Component like Hive, Pig.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Develop ETL job in Talend to load data from ASCII & Flat files.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Worked on filtering raw data using tools like Tableau.
- Responsible for running Hadoop streaming jobs to process csv data.
- Experience in Gradle Build tool and understanding the artifactory and repo structure.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, HBase, Pig, Linux, Sqoop, Oozie, Flume, Talend, Tableau, Maven, GitHub, Gradle, XML, MySQL, MySQL Workbench.
Confidential, TN
Hadoop Developer
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Created Hive tables which are extracted by the relevant EDW tables.
- Responsible to manage data coming from different sources.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Supported Map Reduce Programs those are running on the cluster.
- Worked extensively on Hive and PIG.
- Written Hive queries for data analysis to meet the business requirements.
- Involved in creating UDF's where Custom Functionalities are required.
- Wrote MapReduce job using Java API.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible to manage data coming from different sources.
Environment: Hadoop, MapReduce, HDFS, Eclipse, Omniture, Hive, PIG, HBase, Sqoop, Oozie and SQL.
Confidential, CA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop stack using different big data analytic tools including Pig and Hive, HBase and Sqoop.
- Worked on MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV & other compressed file formats.
- Imported data using Sqoop from various sources like CyberSource, Informatica, Salesforce, Teradata, Authorize.net and Genesys Info Mart to load data into HDFS on a daily basis.
- Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics.
- Handled importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.
- Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
- Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
- Worked as support team member to improve the fraud detection system built using Kafka and Spark.
- Provided support to MapReduce Programs by Cluster monitoring, maintenance and troubleshooting.
- Worked on the backend using Scala and Spark to perform several aggregation logics.
- Worked on real-time analytics and transactional processing using Ignite integrated with Kafka streams.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala queries.
- Experience in using Sequence files, RCFile, Avro and Parquet file formats.
- Worked on implementing cluster coordination services through Zookeeper.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend. Checking & fixing delimiter in ASCII Files.
- Perform data conversion, data integration and create mappings from source to target using Talend.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
- Efficiently handled periodic exporting of SQL data into Elasticsearch.
- Efficiently handled collection of data from various sources and profiling them using various analyst tools like Informatica Data Quality.
- Involved with the design and implementation of the near real-time indexing pipeline, including index-management, cluster maintenance and interacting with Elasticsearch.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked using GitHub as a code repository and Gradle as a build tool
- Used Hive to process data and Batch data filtering. Used Spark for other value centric data filtering.
Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Java, Oozie, Sqoop, Scala, Spark, Flume, Impala, Zookeeper, Ignite, Kafka, MapReduce, Cloudera Manager, Cassandra, Elasticsearch Talend Big Data Studio, Avro, Parquet, Eclipse, My SQL, Gradle, Teradata, CyberSource, Informatica (IDQ), Salesforce, Authorize.net and Genesys Info Mart.
Confidential, CT
Software Developer
Responsibilities:
- Worked on analyzing Hadoop stack using different big data analytic tools including Pig and Hive, HBase and Sqoop.
- Worked on MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV & other compressed file formats.
- Imported data using Sqoop from various sources like CyberSource, Informatica, Salesforce, Teradata, Authorize.net and Genesys Info Mart to load data into HDFS on a daily basis.
- Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics.
- Handled importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.
- Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
- Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
- Worked as support team member to improve the fraud detection system built using Kafka and Spark.
- Provided support to MapReduce Programs by Cluster monitoring, maintenance and troubleshooting.
- Worked on the backend using Scala and Spark to perform several aggregation logics.
- Worked on real-time analytics and transactional processing using Ignite integrated with Kafka streams.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala queries.
- Experience in using Sequence files, RCFile, Avro and Parquet file formats.
- Worked on implementing cluster coordination services through Zookeeper.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend. Checking & fixing delimiter in ASCII Files.
- Perform data conversion, data integration and create mappings from source to target using Talend.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
- Efficiently handled periodic exporting of SQL data into Elasticsearch.
- Efficiently handled collection of data from various sources and profiling them using various analyst tools like Informatica Data Quality.
- Involved with the design and implementation of the near real-time indexing pipeline, including index-management, cluster maintenance and interacting with Elasticsearch.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked using GitHub as a code repository and Gradle as a build tool
- Used Hive to process data and Batch data filtering. Used Spark for other value centric data filtering.
Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Java, Oozie, Sqoop, Scala, Spark, Flume, Impala, Zookeeper, Ignite, Kafka, MapReduce, Cloudera Manager, Cassandra, Elasticsearch Talend Big Data Studio, Avro, Parquet, Eclipse, My SQL, Gradle, Teradata, CyberSource, Informatica (IDQ), Salesforce, Authorize.net and Genesys Info Mart.