We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

New York, NY

SUMMARY:

  • Over 6 years of professional IT experience which includes 4 years of recent experience in Big Data/Hadoop Ecosystem.
  • 4 years of experience in development of Big Data Hadoop eco - system technologies like Map Reduce, HDFS, YARN, Flume, SQOOP, Pig, Spark, HBase, Zookeeper, Hue, Kafka, Hive & Impala .
  • Highly Proficient and in depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and YARN concepts, AWS cloud.
  • Experience with analyzing large data sets using Big Data efficiently.
  • Experience in ingesting Tera bytes of data between HDFS and Relational Database Systems using SQOOP.
  • Proficiency in Spark using Scala for loading data from HDFS, Relational and NoSQL databases using Spark SQL.
  • Proficiency with ingesting data from a range of sources using Spark Streaming.
  • Hands-on experience in using HIVE as storage for ingested data and worked on performance optimizations for query performance.
  • Experience in collecting and aggregating large amount of Log data using Apache Kafka, Flume and storing data in HDFS for further analysis.
  • Hands-on experience of traditional ETL tool Data stage with deep understanding of ETL concepts, ETL loading strategy, Data reconciliation, Error Handling standards.
  • Expertise in the design, development, implementation and maintenance of Data Integration and Data Migration projects.
  • Involved in performance tuning of Data Stage at stage level, job level.
  • Expert in using SQOOP to import and export data from RDBMS to Hadoop and vice-versa
  • Good knowledge in data transformations using Map-Reduce, HIVE and Pig scripts for different file formats.
  • Hands on experience in dealing with Compression Codecs like Snappy, GZIP.
  • Experience in successful implementation of ETL solution between OLTP and OLAP database in support of Decision Support System/Business Intelligence with expertise in all phases of SDLC.
  • Good understanding of NoSQL databases and valuable experience in writing applications on NoSQL databases like HBase.
  • Valuable experience in working on CDH4 and CDH5 Cloudera, MapR and HDP distributions.
  • Hands on experience in application development using core JAVA, RDBMS and Linux shell scripting.
  • Expertise in understanding and implementing Java technologies.
  • Worthwhile experience writing software in continuous build and automated deployment environment.
  • Goal oriented, organized, team player with good interpersonal skills thrives well within group environment as well as individually.
  • Strong business and application analysis skills with excellent communication and professional abilities.

TECHNICAL SKILLS:

Big Data Framework: HDFS, MapReduce, YARN, Hive, Impala, Hue, Pig, SQOOP, Flume, Spark, Zookeeper, Oozie, Kafka, HBase, Storm.

Hadoop Distributions: Apache, Cloudera CDH5, Horton Works, MapR.

Fast Data Technologies: Kafka, Flume, Spark Streaming, AWS EMR.

RDBMS: MySQL, AWS cloud, Oracle, DB2, SQL, PostgreSQL, Teradata.

No SQL Databases: HBase, MongoDB.

IDE s: NetBeans, Eclipse

Languages/Scripting: Core and Advanced Java, Python, Scala, Pig Latin, HQL, SQL, PL/SQL, LINUX shell scripts, Java Script.

Programming language: Scala, Python, SQL, Java

Virtual Machines: VMWare, Virtual Box

OS: Cent OS 5.5, UNIX, LINUX, Windows XP/NT/7/8, Mac

File Formats: XML, Text, Sequence, RC, JSON, ORC, AVRO, and Parquet.

WORK EXPERIENCE:

Hadoop Developer

Confidential, New York, NY

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Used SQOOP to transfer data between Teradata and HDFS and used Flume to stream the log data from servers.
  • Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
  • Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
  • Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
  • Implemented Partitions, Buckets in HIVE for optimization.
  • Implemented Hive optimized joins to gather data from various sources and run ad-hoc queries on top of them.
  • Wrote Hive Generic UDF's to perform business logic operations at record level and table level.
  • Worked on various file formats and compressions Text, Avro, Parquet file formats, snappy, GZIP compression.
  • Developed workflow in OOZIE to automate the tasks of loading the data into HDFS and pre-processing with Pig, Hive, SQOOP.
  • Implemented test scripts to support test driven development and continuous integration.
  • Loading the Analyzed Hive data into NOSQL databases like HBase.
  • Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
  • Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark , with Hive and SQL.
  • Scheduling and managing Cron jobs, wrote shell scripts to generate alerts.
  • Involved in working with offshore team on daily and BI weekly sprint basis.

Environment: Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Spark, YARN, SQOOP, Flume, Kafka, Zookeeper, Cloudera, Oozie, UNIX Shell Scripting, Teradata.

Hadoop Developer

Confidential, New York, NY

Responsibilities:

  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in the process of data acquisition, data pre-processing and data exploration.
  • As part of data acquisition, SQOOP and flume were used for incremental imports to inject the data from various sources into Hadoop file system.
  • In pre-processing phase, we removed all the missing data and applied relevant transformations.
  • In data exploration stage used hive and impala to get some insights about the customer data.
  • Used Flume, SQOOP, Hadoop and Oozie for building data pipeline.
  • Imported and exported data from HDFS to Hive using SQOOP
  • Implemented job flows and monitored Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from various sources
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Extracted tables from MYSQL through SQOOP and placed in HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Assisted in exporting analyzed data to relational databases using SQOOP and Impala.
  • Created and maintained Technical documentation for launching HADOOP Clusters, executing Hive queries and Pig Scripts.
  • Used Oozie for scheduling workflows.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, SQOOP, Impala, HBase, Oozie, Flume, MYSQL, Windows, AWS S3, UNIX Shell Scripting, HDP .

Hadoop Developer

Confidential, Norwalk, CT

Responsibilities:

  • Responsible to manage data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
  • Created Data Pipeline of Map Reduce programs using Chained Mappers.
  • Visualize the HDFS data to customer using BI tool with the help of HIVE ODBC Driver.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Worked big data processing of clinical and non-clinical data using MapR.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Created customized BI tool for manager team that perform Query analytics using HQL.
  • Used Hive and Pig to generate BI reports.
  • Imported data using SQOOP to load data from MySQL to HDFS on regular basis.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Moved Relational Data base data using SQOOP into Hive Dynamic partition tables using staging tables.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Experienced with different kind of compression techniques like LZO, GZIP, and Snappy.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.

Environment: Hadoop, HDFS, Map Reduce, SQOOP, Oozie, Pig, Hive, Flume, LINUX, MySQL, Java, Eclipse, MapR, Windows, UNIX Shell Scripting, and Eclipse.

Web Developer

Confidential, Meriden, CT

Responsibilities:

  • Developing front-end screens using JSP, HTML and CSS.
  • Developing modules for exceptions, utility classes, business delegate, and test cases using core Java.
  • Developing SQL queries using MYSQL.
  • Working with Eclipse using Maven plugin for Eclipse IDE.
  • Writing Client Side validations using JavaScript.
  • Extensively used jQuery for developing interactive web pages.
  • Application was developed in Eclipse IDE and was deployed on Tomcat server.

Environment: Java/J2EE, Oracle, SQL, PL/SQL, JSP, Tomcat, HTML, AJAX, Java Script, JDBC, XML, UML, JUnit, Eclipse.

We'd love your feedback!