We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Raleigh, NC

SUMMARY:

  • Over 6 years of data analytics and visualization experience including 2+ years of big data/hadoop technologies with full project development, implementation and deployment on Linux/Windows/Unix.
  • 2+ years of experience in implementing big data applications using HDFS, Mapreduce, Pig and Hive.
  • Proficient in using data visualization tools Tableau, QlikView, Plotly, Raw, Palladio, and MS Excel.
  • Experience in building data models with PowerPivot.
  • Hands on experience on HDFS, HIVE, PIG, Hadoop Map Reduce framework and SQOOP.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs).
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Developed PIG Latin scripts for handling business transformations.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
  • Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
  • Worked on ETL reports using Tableau and created statistics dashboards for Analytics.
  • Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Interacted directly with Hortonworks team for Hadoop cluster related issues and resolved the same.
  • Experience in setting up Hadoop on Pseudo distributed environment.
  • Experience in setting up HIVE, PIG and SQOOP on Ubuntu Operating system.
  • Familiarity with common computing environment (e.g. Linux, Shell Scripting)
  • Good team player with ability to solve problems, organize and prioritize multiple tasks.
  • Excellent communication and inter - personal skills.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Tez, Impala, Mahout, Ambari, Hadoop Streaming

RDBMS: Oracle, DB2, SQL Server

Scripting/Query: Shell, SQL, HiveQL

RDBMS: Oracle, DB2, SQL Server

NoSQL: HBase, Cassandra

Visualization: Tableau Desktop 8.3, Plotly, Raw, Palladio

Web Servers: WebLogic, WebSphere, Apache Tomcat.

IDEs: RStudio, PyCharm, Eclipse

Platforms: Windows, UNIX, LINUX

Currently Learning: Spark, Scala, R and Python

PROFESSIONAL EXPERIENCE:

Confidential, Raleigh, NC

Hadoop Developer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Tested Apache™ Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs
  • Used Mahout to understand the machine learning algorithms for an efficient data processing
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats

Environment: Hadoop 0.20.2 - PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.

Confidential, Durham, NC

Big Data Developer

Responsibilities:

  • Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in clinical and biomedical.
  • Read data from local files, XML files, excel files, JSON files in python with use of PANDAS module.
  • Read from SQL DBs, Web through APIs and processed them for further use in python with PANDAS module.
  • Performed subset, sort, reshape, merge, slice and edit on collected data with use of Numpy and Pandas module of python.
  • Developed histogram, scatter, 3-D and other plots with use of different color combination in python with Matplotlib library of Python.
  • Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Interfaced with large scale database system through an ETL server for data extraction and preparation.
  • Migrating the data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Proposed an automated system using Shell script to sqoop the job.
  • Worked in Agile development approach.
  • Created the estimates and defined the sprint stages.
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Mainly worked on Hive queries to categorize data of different claims.
  • Integrated the hive warehouse with HBase
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Presented data and dataflow using Talend for reusability.

Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Cloudera CDH4, Oracle, MySQL, Tableau, Talend, Elastic search

We'd love your feedback!