Hadoop Developer Resume New York, NY - Hire IT People

SUMMARY:

Over 6 years of professional IT experience which includes 4 years of recent experience in Big Data/Hadoop Ecosystem.
4 years of experience in development of Big Data Hadoop eco - system technologies like Map Reduce, HDFS, YARN, Flume, SQOOP, Pig, Spark, HBase, Zookeeper, Hue, Kafka, Hive & Impala .
Highly Proficient and in depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and YARN concepts, AWS cloud.
Experience with analyzing large data sets using Big Data efficiently.
Experience in ingesting Tera bytes of data between HDFS and Relational Database Systems using SQOOP.
Proficiency in Spark using Scala for loading data from HDFS, Relational and NoSQL databases using Spark SQL.
Proficiency with ingesting data from a range of sources using Spark Streaming.
Hands-on experience in using HIVE as storage for ingested data and worked on performance optimizations for query performance.
Experience in collecting and aggregating large amount of Log data using Apache Kafka, Flume and storing data in HDFS for further analysis.
Hands-on experience of traditional ETL tool Data stage with deep understanding of ETL concepts, ETL loading strategy, Data reconciliation, Error Handling standards.
Expertise in the design, development, implementation and maintenance of Data Integration and Data Migration projects.
Involved in performance tuning of Data Stage at stage level, job level.
Expert in using SQOOP to import and export data from RDBMS to Hadoop and vice-versa
Good knowledge in data transformations using Map-Reduce, HIVE and Pig scripts for different file formats.
Hands on experience in dealing with Compression Codecs like Snappy, GZIP.
Experience in successful implementation of ETL solution between OLTP and OLAP database in support of Decision Support System/Business Intelligence with expertise in all phases of SDLC.
Good understanding of NoSQL databases and valuable experience in writing applications on NoSQL databases like HBase.
Valuable experience in working on CDH4 and CDH5 Cloudera, MapR and HDP distributions.
Hands on experience in application development using core JAVA, RDBMS and Linux shell scripting.
Expertise in understanding and implementing Java technologies.
Worthwhile experience writing software in continuous build and automated deployment environment.
Goal oriented, organized, team player with good interpersonal skills thrives well within group environment as well as individually.
Strong business and application analysis skills with excellent communication and professional abilities.

TECHNICAL SKILLS:

Big Data Framework: HDFS, MapReduce, YARN, Hive, Impala, Hue, Pig, SQOOP, Flume, Spark, Zookeeper, Oozie, Kafka, HBase, Storm.

Hadoop Distributions: Apache, Cloudera CDH5, Horton Works, MapR.

Fast Data Technologies: Kafka, Flume, Spark Streaming, AWS EMR.

RDBMS: MySQL, AWS cloud, Oracle, DB2, SQL, PostgreSQL, Teradata.

No SQL Databases: HBase, MongoDB.

IDE s: NetBeans, Eclipse

Languages/Scripting: Core and Advanced Java, Python, Scala, Pig Latin, HQL, SQL, PL/SQL, LINUX shell scripts, Java Script.

Programming language: Scala, Python, SQL, Java

Virtual Machines: VMWare, Virtual Box

OS: Cent OS 5.5, UNIX, LINUX, Windows XP/NT/7/8, Mac

File Formats: XML, Text, Sequence, RC, JSON, ORC, AVRO, and Parquet.

WORK EXPERIENCE:

Hadoop Developer

Confidential, New York, NY

Responsibilities:

Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
Used SQOOP to transfer data between Teradata and HDFS and used Flume to stream the log data from servers.
Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
Implemented Partitions, Buckets in HIVE for optimization.
Implemented Hive optimized joins to gather data from various sources and run ad-hoc queries on top of them.
Wrote Hive Generic UDF's to perform business logic operations at record level and table level.
Worked on various file formats and compressions Text, Avro, Parquet file formats, snappy, GZIP compression.
Developed workflow in OOZIE to automate the tasks of loading the data into HDFS and pre-processing with Pig, Hive, SQOOP.
Implemented test scripts to support test driven development and continuous integration.
Loading the Analyzed Hive data into NOSQL databases like HBase.
Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark , with Hive and SQL.
Scheduling and managing Cron jobs, wrote shell scripts to generate alerts.
Involved in working with offshore team on daily and BI weekly sprint basis.

Environment: Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Spark, YARN, SQOOP, Flume, Kafka, Zookeeper, Cloudera, Oozie, UNIX Shell Scripting, Teradata.

Hadoop Developer

Confidential, New York, NY

Responsibilities:

Analyzed large data sets by running Hive queries and Pig scripts.
Involved in the process of data acquisition, data pre-processing and data exploration.
As part of data acquisition, SQOOP and flume were used for incremental imports to inject the data from various sources into Hadoop file system.
In pre-processing phase, we removed all the missing data and applied relevant transformations.
In data exploration stage used hive and impala to get some insights about the customer data.
Used Flume, SQOOP, Hadoop and Oozie for building data pipeline.
Imported and exported data from HDFS to Hive using SQOOP
Implemented job flows and monitored Hadoop log files.
Load and transform large sets of structured, semi structured and unstructured data
Responsible to manage data coming from various sources
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Involved in creating Hive tables, and loading and analyzing data using hive queries.
Developed Simple to complex MapReduce Jobs using Hive and Pig.
Involved in running Hadoop jobs for processing millions of records of text data.
Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
Involved in loading data from LINUX file system to HDFS.
Responsible for managing data from multiple sources.
Extracted tables from MYSQL through SQOOP and placed in HDFS.
Load and transform large sets of structured, semi structured and unstructured data.
Assisted in exporting analyzed data to relational databases using SQOOP and Impala.
Created and maintained Technical documentation for launching HADOOP Clusters, executing Hive queries and Pig Scripts.
Used Oozie for scheduling workflows.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, SQOOP, Impala, HBase, Oozie, Flume, MYSQL, Windows, AWS S3, UNIX Shell Scripting, HDP .

Hadoop Developer

Confidential, Norwalk, CT

Responsibilities:

Responsible to manage data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
Created Data Pipeline of Map Reduce programs using Chained Mappers.
Visualize the HDFS data to customer using BI tool with the help of HIVE ODBC Driver.
Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
Worked big data processing of clinical and non-clinical data using MapR.
Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
Responsible for importing log files from various sources into HDFS using Flume.
Created customized BI tool for manager team that perform Query analytics using HQL.
Used Hive and Pig to generate BI reports.
Imported data using SQOOP to load data from MySQL to HDFS on regular basis.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Created Hive Generic UDF's to process business logic that varies based on policy.
Moved Relational Data base data using SQOOP into Hive Dynamic partition tables using staging tables.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Experienced with different kind of compression techniques like LZO, GZIP, and Snappy.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.

Environment: Hadoop, HDFS, Map Reduce, SQOOP, Oozie, Pig, Hive, Flume, LINUX, MySQL, Java, Eclipse, MapR, Windows, UNIX Shell Scripting, and Eclipse.

Web Developer

Confidential, Meriden, CT

Responsibilities:

Developing front-end screens using JSP, HTML and CSS.
Developing modules for exceptions, utility classes, business delegate, and test cases using core Java.
Developing SQL queries using MYSQL.
Working with Eclipse using Maven plugin for Eclipse IDE.
Writing Client Side validations using JavaScript.
Extensively used jQuery for developing interactive web pages.
Application was developed in Eclipse IDE and was deployed on Tomcat server.

Environment: Java/J2EE, Oracle, SQL, PL/SQL, JSP, Tomcat, HTML, AJAX, Java Script, JDBC, XML, UML, JUnit, Eclipse.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship