We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY:

  • 5+ years of professional experience in IT, which includes three years of experience in Hadoop development and Big Data.
  • Strong experience in Hadoop Distributed File System and Ecosystem: Hive, Pig, HBase, Zookeeper, Sqoop, Impala and Flume.
  • Worked with Big Data distributions Cloudera CDH5, CDH4, CDH3 and Hortonworks 2.5.
  • Using Ambari configuring initial development environment using Hortonworks standalone sandbox and monitoring the Hadoop echo system.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystems such as Hadoop Map Reduce (MR1), YARN (MR2), HIVE, PIG, SQOOP, SPARK, FLUME and OOZIE.
  • Good knowledge in setting up, maintain and configuring the Hadoop cluster.
  • Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera, Hortonworks distributions and AWS.
  • Hands - on experience in Sqoop which is used in importing and exporting of the data from HDFS to Relational Databases and vice-versa.
  • Imported different kinds of data such as JSON, Log data into pig using different loaders available.
  • Experience in ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
  • Good in data extraction, manipulation, analyzing and validation of large volume of data.
  • Having good experience on all flavors of Hadoop (Cloudera, Hortonworks, MapR) etc.
  • Developed applications using Java, RDBMS, and Linux shell scripting.
  • Developed custom UDF's for Pig and Hive using java to process and analyze the data.
  • Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
  • Worked on various databases such as MySQL, Oracle, MS-SQL SERVER.
  • Worked on NoSQL databases including HBase, Cassandra.
  • Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
  • Performed different transformations and actions using PySpark to join, aggregate and calculate different statistical values of the data.
  • Familiar about using Dockers.
  • Experience on cloud technologies like Amazon Web Services (AWS).
  • Good knowledge on Apache Kafka and in configuring producers and consumers in it.
  • Experience in converting Hive/SQL queries into Spark transformations using Java.
  • Good knowledge on creating workflows to execute Pig, Hive jobs using Oozie Workflow Engine.
  • Excellent Java development skills using J2EE, Servlets, Junit, JSP, JDBC.
  • Expertise in Java Multithreading, Exception Handling, JSP, Servlets, JavaScript, JQuery, AJAX, CSS, HTML, Spring, Hibernate, Enterprise Java Beans, JDBC, RMI and XML related technologies.
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
  • Excellent communication skills, team player, analytical, research-minded, technically competent, enjoy facing challenges, approach-oriented, positive minded.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services, EMR, MRUnit, Spark, Storm,R, R studio.

Java & J2EE Technologies: Core Java, JDBC, Servlets, JSP, JNDI, Struts, Spring, Hibernate and Web Services (SOAP and Restful)

IDE’s: Eclipse, MyEclipse, IntelliJ

Frameworks: MVC, Struts, Hibernate, Spring

Programming languages: C,C++, Java, Python, Linux shell scripts, R,

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Graph DB

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, Restful WS

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

ETL Tools: Informatica, Qlikview and Cognos

PROFESSIONAL EXPERIENCE:

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
  • Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Implemented Kafka Custom partitioners to send data to different categorized topics.
  • Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
  • Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
  • Implemented Storm topology with Streaming group to perform real time analytical operations.
  • Created Hive Generic UDF's to process business logic with Hive QL.
  • Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
  • Extensively used Pig for data cleansing.
  • Created risky segments using clustering algorithms for protecting the assets in reinsurance using Python, Scikit - learn.
  • Created several new analytical approaches from scratch across data mining, instance modeling, and predictive analytics using SQL, Python, Pandas, NumPy, SciPy, R, Aapache Spark HDFS, S3, Scikit-learn.
  • Developed and carried forward a coherent research strategy in predictive modeling, machine learning, big data analysis and environmental information systems with Python, SAS, and R programming.
  • Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
  • Development of Oozie workflow for orchestrating and scheduling the ETL process.
  • Involved in implementation of Avro, ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
  • Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database.
  • Involved in creation of virtual machines and infrastructure in the Azure Cloud environment.
  • Experience in implementing Kafka consumers and producers by extending Kafkahigh-level API in java and ingesting data to HDFS or HBase depending on the context.
  • Involved in developing Azure Web role and Worker roles.
  • Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
  • Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
  • Developed Spark scripts by using Python Shell commands as per the requirement.
  • Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
  • Experience implementing machine learning techniques in spark by using spark Mlib.
  • Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
  • Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
  • Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.

Environment: Hadoop, Map Reduce, Hive, Pig, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Python, Scala, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle .

Confidential, New Jersey

Hadoop Developer

Responsibilities:

  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Migrated Map Reduce jobs to Spark Jobs to achieve better performance.
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
  • Design and Implementation of Real time applications using Apache Storm, Trident Storm, Kafka, and Apache ignite Memory grid and Accumulo.
  • Developed Map Reduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
  • Experienced with batch processing of data sources using Apache Spark, Elastic Search.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Good knowledge in cloud integration with Amazon Elastic Map Reduce (EMR).
  • Installing and maintaining the Hadoop - Spark cluster from the scratch in a plain Linux environment and defining the code outputs as PMML.
  • Implemented Data loading using Spark, Storm, Kafka, Elastic Search.
  • Experience in integrating Cassandra with Elastic Search and Hadoop .
  • Stored data in AWS S3 similar to HDFS. Also, performed EMR programs on data stored in S3.
  • Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
  • Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR).
  • Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
  • Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
  • Created HBase tables to store variable data formats of input data coming from different portfolios.
  • Involved in adding huge volumes of data in rows and columns to store data in HBase.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code using Scala and Spark -SQL for batch processing of data.
  • Integration of Cassandra with Talend and automation of jobs.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Design and development of database operations in PostgreSQL.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark -SQL, Data Frame, pair RDD's, Spark YARN.
  • Hands on experience working on NoSQL databases like HBase and PostgreSQL.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Used Spark Data Frame API to process Structured and Semi Structured files and load them back into S3 Bucket.
  • Designed application which receives data from several source systems and ingest to PostgreSQL database.

Environment : Hadoop, Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS, Hive, Pig, Sqoop, Oozie, Java, SQL, Python, Scala, Impala, AWS, Java, Shell script, Talend

Confidential

Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in writing HiveQL to extract, transform and load the data into Database.
  • Performed Hive partitioning, bucketing and executing different types of joins on Hive tables and implementing Hive Serves like JSON and Avro.
  • Fine tuning hive jobs for optimized performance.
  • Developed Pig Latin scripts for transformations, event joins, filter.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Experienced in working with Apache Storm.
  • Involved in designing, reviewing, optimizing data transformation processes using Apache Storm.
  • Developed SparkSQL to load tables into HDFS to run select queries on top.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Imported data from Kafka consumer into HBase using spark streaming.
  • Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Automated all the jobs, for ingesting data from relational databases to load data into Hive tables, using Oozie workflows.
  • Performance tuning of queries in Impala for faster retrieval.
  • Created Hive, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
  • Experience in HBase database manipulation with structured, unstructured and semi-structured datasets.
  • Involved in ETL, Data Integration, Ingestion and Migration.
  • Implemented ETL pipelines to ingest data from traditional EDW Hadoop .
  • Generated various marketing reports using Tableau with Hadoop as a source for data.
  • Implemented Continuous Integration using Jenkins.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making Scrum meetings more productive.

Environment: Apache Hadoop, Apache Hive, HDFS, Map Reduce, Pig, Spark, Storm, Impala, Oozie, Flume, Kafka, Tableau, Jenkins, Java, Python, Linux, Agile.

Confidential

Java developer

Responsibilities:

  • Involving in design, development, testing and implementation of the process systems, working on iterative life cycles business requirements, and creating Detail Design Document.
  • Developed JAX-RS RESTful web services that consumes and produces both XML and JSON content using jersey to retrieve specific details for Case Management System products
  • Configured JPA Persistence API to interact with Oracle 11g database and Hibernate as platform and created POJO's classes as JPA entities.
  • Converted XML into JAVA objects using JAXB API.
  • Involved in development of the application using Spring Web MVC and other components of the Spring Framework.
  • Used Hibernate to store the persistent data as an Object-Relational Mapping (ORM) tool for communicating with database.
  • Used UNIX shell scripts to deploy the application on Amazon web server.
  • Developed User interface using HTML, CSS, JavaScript, and CSS, Bootstrap, Ajax and JSON.
  • Used jQuery to perform the AJAX calls and to load the surveys.
  • Extensively used Alpaca forms for various form fields to fetch the inputs from the user/customer.
  • Written Embedded JS to combine data and a template to produce HTML.
  • Applied AngularJS to define a route to the REST services and render the Ej's templates.
  • Responsible for developing new REST APIs for utilizing JAX-RS on Web Sphere.
  • Utilized Web Logic application server to build and deploy the enterprise application.
  • Utilized Alpaca forms to create interactive HTML5 forms with jQuery.
  • Used GitHub Repository to check in, check out, and merge code, issue tracking and wikis.
  • Used Maven to build and deploy the application.

Environment: Java, J2EE, JSP, Struts, Hibernate, AngularJS, JUnit, MVC, Eclipse, AJAX, Apache Tomcat, Log4J, SVN, MySQL, HTML, CSS, JavaScript.

We'd love your feedback!