Hadoop Developer Resume Raleigh, NC - Hire IT People

SUMMARY:

Over 6 years of data analytics and visualization experience including 2+ years of big data/hadoop technologies with full project development, implementation and deployment on Linux/Windows/Unix.
2+ years of experience in implementing big data applications using HDFS, Mapreduce, Pig and Hive.
Proficient in using data visualization tools Tableau, QlikView, Plotly, Raw, Palladio, and MS Excel.
Experience in building data models with PowerPivot.
Hands on experience on HDFS, HIVE, PIG, Hadoop Map Reduce framework and SQOOP.
Worked extensively with HIVE DDLs and Hive Query language (HQLs).
Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
Developed PIG Latin scripts for handling business transformations.
Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
Worked on ETL reports using Tableau and created statistics dashboards for Analytics.
Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
Interacted directly with Hortonworks team for Hadoop cluster related issues and resolved the same.
Experience in setting up Hadoop on Pseudo distributed environment.
Experience in setting up HIVE, PIG and SQOOP on Ubuntu Operating system.
Familiarity with common computing environment (e.g. Linux, Shell Scripting)
Good team player with ability to solve problems, organize and prioritize multiple tasks.
Excellent communication and inter - personal skills.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Tez, Impala, Mahout, Ambari, Hadoop Streaming

RDBMS: Oracle, DB2, SQL Server

Scripting/Query: Shell, SQL, HiveQL

RDBMS: Oracle, DB2, SQL Server

NoSQL: HBase, Cassandra

Visualization: Tableau Desktop 8.3, Plotly, Raw, Palladio

Web Servers: WebLogic, WebSphere, Apache Tomcat.

IDEs: RStudio, PyCharm, Eclipse

Platforms: Windows, UNIX, LINUX

Currently Learning: Spark, Scala, R and Python

PROFESSIONAL EXPERIENCE:

Confidential, Raleigh, NC

Hadoop Developer

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
Tested Apache™ Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Installed Oozie workflow engine to run multiple Hive and Pig jobs
Used Mahout to understand the machine learning algorithms for an efficient data processing
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats

Environment: Hadoop 0.20.2 - PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.

Confidential, Durham, NC

Big Data Developer

Responsibilities:

Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in clinical and biomedical.
Read data from local files, XML files, excel files, JSON files in python with use of PANDAS module.
Read from SQL DBs, Web through APIs and processed them for further use in python with PANDAS module.
Performed subset, sort, reshape, merge, slice and edit on collected data with use of Numpy and Pandas module of python.
Developed histogram, scatter, 3-D and other plots with use of different color combination in python with Matplotlib library of Python.
Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
Interfaced with large scale database system through an ETL server for data extraction and preparation.
Migrating the data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Proposed an automated system using Shell script to sqoop the job.
Worked in Agile development approach.
Created the estimates and defined the sprint stages.
Developed a strategy for Full load and incremental load using Sqoop.
Mainly worked on Hive queries to categorize data of different claims.
Integrated the hive warehouse with HBase
Written customized Hive UDFs in Java where the functionality is too complex.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
Monitored System health and logs and respond accordingly to any warning or failure conditions.
Presented data and dataflow using Talend for reusability.

Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Cloudera CDH4, Oracle, MySQL, Tableau, Talend, Elastic search

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Raleigh, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship