We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY:

  • 5+ years of IT experience in the field of Information Technology that includes analysis, design, development and testing of complex applications.
  • Strong working experience with Big Data and Hadoop Ecosystems including HDFS, PIG, HIVE, HBase, Yarn, Sqoop, Flume, Oozie, Hue, MapReduce and Spark.
  • Extensive experience in analyzing data using Hive QL, Pig Latin and MapReduce programs in Java.
  • Extensively implemented POC's on migrating to Spark - Streaming to process the live data.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Hands on with real time data processing using distributed technologies Storm and Kafka.
  • Used Different Spark Modules like Spark core, Spark RDD's, Spark Dataframe, Spark SQL.
  • Converted Various Hive queries into Spark transformations and Actions that are required.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including HBase database and Sqoop.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using CSS, Avro, Parquet, JSON, CSV.
  • Having good knowledge of Oracle as Database and excellent in writing the SQL queries and scripts.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Hive, MapReduce, Pig, Sqoop, Oozie, FlumeKafka, YARN and Spark

Scripting Languages: Shell, Python

Programming Languages: Java, Scala, Python, SQL, C

Hadoop Distributions: Cloudera, Hortonworks, MapR

NoSQL databases: HBase, Cassandra

Tools: SVN, GitHub, Jenkins, Tableau

Operating systems: UNIX, LINUX, Mac OS and Windows

Databases: Oracle, SQL Server, MySQL.

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix AZ

Hadoop Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Developed Kafka consumer component for near real-time and Real-Time data processing in Java.
  • Worked on load and transform large sets of structured data using spark streaming.
  • Worked on loading the data into spark RDD, Data Frames and performed in-memory data computation to get faster output response.
  • Part of designing and developing a custom Java to pull data from source systems and publish the resultant to a specific Kafka Topic.
  • Developing spark jobs by using Java and Spark-SQL migrating the SQL procedures.
  • Creating Kafka connectors to pull the data from the database and publish the data into Kafka.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Working on migrating the history ingestion data from Datawarehouse to Bigdata hive.
  • Involved in loading data from Unix File System into HDFS with different format of data.
  • Working with different data sources like XML files, JSON files, SQL server and DB2 to load data into Hive tables and HDFS.
  • Worked with to create external tables, staging tables and joined the tables as per the requirement and built multiple data pipelines.
  • Worked on performance tuning of Hive and Spark jobs.
  • Used HiveQL for data analysis for importing the structured data to specified tables for reporting.
  • Monitored and tracked the issues within the team using JIRA.
  • Working application in Agile methodology.

Environment: Hadoop, Java, Linux, Map R, SQL, Kafka, Hive, Spark, Oracle, DB2, Netezza, Oozie, Informatica, Jira, Rally.

Confidential, New Jersey NJ

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Load the data into spark RDD and performed in-memory data computation to get faster output response.
  • Developed Spark jobs and Hive Jobs to transform data.
  • Developed Spark scripts by writing custom RDDs in Python for data transformations and perform actions on RDDs.
  • Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop.
  • Developed Sqoop scripts to import, export data from relational sources and handled incremental loading on the data by date.
  • Developed Kafka consumer component for Real-Time data processing in Java and Scala.
  • Used Impala to query Hive tables for faster query response times.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Created Partitioned and Bucketed Hive tables in Parquet and Avro File Formats with Snappy compression and then loaded data.
  • Written Hive queries using spark SQL that integrates with spark environment.
  • Developed MapReduce programs to parse the raw JSON data and store the refined data in tables
  • Used Kafka to load data in to HDFS and move data into HBase.
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Worked on moving data pipelines from CDH cluster to run on AWS EMR.
  • Involved in moving data from HDFS to AWS Simple Storage Service (S3) and extensively worked with S3 bucket in AWS.
  • Developed spark application for filtering Json source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of Json files.
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift.

Environment: Linux, Hadoop, Python, Scala, CDH, SQL, Sqoop, HBase, Hive, Spark, Oozie, Cloudera Manager, Oracle, Windows, Yarn, Spring, Sentry, AWS, S3, SQL.

Confidential, Richardson, TX

Hadoop Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Generated Java APIs for retrieval and analysis on No-SQL Cassandra database.
  • Helped with the sizing and performance tuning of the Cassandra cluster.
  • Developed Hive queries to process the data and generate the results in a tabular format.
  • Handled importing of data from multiple data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
  • Worked on extracting data from CSV, JSON Files and stored them in Avro and parquet formats.
  • Implemented Partition, bucketing concepts in Hive and designed both Managed and External tables in Hive.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement in project.
  • Load and transform large sets of structured, semi structured using Hive.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Involved in writing OOZIE jobs for workflow automation.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.

Environment : Unix, Linux, Hortonworks, Scala, HDFS, Map Reduce, Hive, Flume, Sqoop, Ganglia, Ambari, Oracle, Ranger, Python, Apache Hadoop, Cassandra.

Confidential

JAVA Developer

Responsibilities:

  • Performed Requirement Gathering & Analysis by actively soliciting, analyzing and negotiating customer requirements and prepared the requirements specification document for the application using Microsoft Word.
  • Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
  • Developed presentation layer using Java Server Faces (JSF) MVC framework.
  • Used JSP, HTML and CSS, jQuery as view components in MVC.
  • Developed custom controllers for handling the requests using the spring MVC controllers.
  • Used JDBC to invoke Stored Procedures and used JDBC for database connectivity to SQL.
  • Deployed the applications on WebLogic Application Server.
  • Developed Web services using Restful and JSON.
  • Created and managed microservices using Spring Boot that create, update, delete and get the data.
  • Used Oracle database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
  • Developed JUnit Test Cases for Code unit test.
  • Worked with configuration management groups for providing various deployment environments set up including System Integration testing, Quality Control testing etc.

Environment: Java/J2EE, SQL, Oracle, JSP, JSON, Java Script, Web Logic, HTML, JDBC, Spring, Hibernate, XML, JMS, log4j, JUnit, Servlets, MVC, Eclipse.

We'd love your feedback!