Hadoop developer Resume Lawrenceville, NJ - Hire IT People

SUMMARY:

Having over 5+ years of total IT experience with over 3 years of experience in Big Data Hadoop, 2 years of experience in Java and ETL Projects with extensive knowledge in pharma and finance domain
Experience in HDP (Hortonworks Data Platform) distributed model.
A former Java programmer with newly acquired skills, an insatiable intellectual curiosity, and the ability to mine hidden gems located within large sets of structured, semi - structured and unstructured data.
Worked on recommendation platform based on content and collaboration models.
Worked with the Data Science team to gather requirements for various data mining projects.
Developed multiple Map Reduce jobs in JAVA and PIG for data cleaning and preprocessing.
Hands on experience in Hadoop ecosystem components like Map Reduce, HDFS, Sqoop, Pig, Hive and Oozie.
Working experience in ingesting data on to the clusters using Sqoop (incremental)
Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
Good knowledge of Scala APIs
Working knowledge of R and Python
Experience in using Hcatalog for Hive, Pig and HBase and their integration
Load and transform large sets of structured, semi structured and unstructured data
Experience in Administration of Hadoop Eco systems.
Used the Microsoft Query feature to access Hive data with Hive ODBC driver and also use the Excel Power View feature to analyze the data and also uses Tableau
Experience in developing stored procedures, triggers using SQL, PL/SQL in relational databases such as MS SQL Server 2005/2008.
Proficiency in SDLC methodologies and development processes such as requirement gathering, analysis and definition, proof of concept, designing and implementation.
A proactive planner with a flair for adopting emerging trends and addressing industry requirements to achieve organizational objectives.
An effective communicator with exceptional analytical, technical, negotiation and management skills with the ability to relate to people at any level of business and management

SKILL SET:

Object Oriented Languages: Java, Python

Statistical languages: R, Python, Matlab, SAS

Query languages: Sql, plsql

Hadoop ecosystem: MapReduce, Hive, Pig, HDFS, Sqoop, Flume, Oozie, Hbase

Technologies: Distributed systems, Machine learning, Data mining

Distributions: Hortonworks (HDP 1.2), Cloudera distribution

Markup languages: HTML, XML, JSON

Servers: Websphere, Weblogic, Tomcat

Databases: Oracle 12c/11g/10g, MySQL, HBase, NoSQL

Revision controlling systems: CVS, Github, SVC

Data modeling tools: RStudio, SPSS

ETL tools: Informatica, Datastage

File Systems: HDFS, Linux, Windows

Java and J2EE technologies: Servlets, JSP, JDBC

Query Performance Tuning: Oracle hints, query execution analysis, indexes, partitions

Data visualization tool: Tableau

Statistical analysis: A/B testing, Hypothesis testing, ANOVA, T- tests, F-tests, Central limit theorem

Distribution analysis: Histograms, Scatter plots, Scatter matrices, Heat maps

Scripting languages: Shell scripting, perl

IDEs: Eclipse, Netbeans, Wing, Spyder

Agile platform: Rally

MS Office tools: Excel, Powerpoint, Word

WORK EXPERIENCE:

Confidential, Lawrenceville, NJ

Hadoop developer

Responsibilities:

Imported data from RDBMS systems to HDFS cluster using Sqoop
Created HIVE staging tables to store imported data
Developed HQL scripts to preprocess staging data
Developed custom UDF ’s in Java and used them in Hive queries
Developed Informatica mappings, workflows, applications to transform these data sets
Developed Pig scripts to process some clinical studies
Developed shell scripts to call these HQL scripts and Informatica workflows
Optimized Hive queries using - hints, map side joins, predicate pushdown, orc tables, cost based optimizations
Performed data quality checks using QuerySurge
Created data models using Erwin data modeler
Developed java utilities to parse, transform, generate name-value pairs, combine results from spreadsheets using Apache dependencies
Used Informatica analyst to check the health of data

Environment : HDFS, Map Reduce, YARN, Hive, Sqoop, Pig, Java, shell scripts, Informatica BDE, Erwin, QuerySurge, Spotfire, Ambari, Toad, Oracle 12c

Confidential, Boston, MA

Hadoop developer

Responsibilities:

Loaded data from RDBMS server to HDFS cluster
Developed ETL scripts to load data into warehouse
Created HIVE tables to store processed results in tabular format
Developed Sqoop scripts to make interaction between Hive and Oracle database
Developed Counters to debug complex mapreduce programs
Developed complex Reduce side joins
Worked on optimization of Map reduce jobs using combiners
Used ORC Format to improve the performance of HIVE queries
Created managed tables and external tables in HIVE
Performed complex HQL queries on HIVE tables
Optimized Hive tables using optimization techniques like partitioning and bucketing to provide better performance with HQL queries
Implemented dynamic partitions
Created custom user defined functions in Hive
Scheduled jobs in production environment using Oozie scheduler
Debugged jobs using counters and Hadoop logs
Part of team that developed PIG scripts

Environment : Hadoop, Hive, Sqoop, Pig, Java, shell scripts, sql developer plus, Sql server

Confidential

Data Science Intern

Responsibilities:

Plotted histograms to look the distributions of variables
Used scatter plots, heat maps and correlation coefficients to get rid of correlated features
Identified correlations and distributions using Tableau
Used principal component analysis to factor only top few Eigen vectors into the model
Performed scaling and transformation of variables to improve the performance of gradient descent approaches
Implemented logistic regression , decision trees and Navies Bayesian models to predict whether loan is default
Compared the performance of the models using ROC curves
Finally built random forests to further improve the performance of the model, avoiding overfitting problems

Environment: R, RStudio, Python

Confidential

Software Engineer Intern

Responsibilities:

An interactive android application which analysis user’s aptitude by introducing levels based on the complexity and percentage of correct answers for a given set of questions
Client back-end is implemented using Java
Server back - end is implemented using Php
Front-end is designed using Xml
MySql is used for creating and maintaining database

Environment: Java 1.6, PHP, HTML, CSS, Javascript

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Lawrenceville, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship