Hadoop Developer Resume
New York, NY
SUMMARY:
- Over 6 years of professional IT experience which includes 4 years of recent experience in Big Data/Hadoop Ecosystem.
- 4 years of experience in development of Big Data Hadoop eco - system technologies like Map Reduce, HDFS, YARN, Flume, SQOOP, Pig, Spark, HBase, Zookeeper, Hue, Kafka, Hive & Impala .
- Highly Proficient and in depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and YARN concepts, AWS cloud.
- Experience with analyzing large data sets using Big Data efficiently.
- Experience in ingesting Tera bytes of data between HDFS and Relational Database Systems using SQOOP.
- Proficiency in Spark using Scala for loading data from HDFS, Relational and NoSQL databases using Spark SQL.
- Proficiency with ingesting data from a range of sources using Spark Streaming.
- Hands-on experience in using HIVE as storage for ingested data and worked on performance optimizations for query performance.
- Experience in collecting and aggregating large amount of Log data using Apache Kafka, Flume and storing data in HDFS for further analysis.
- Hands-on experience of traditional ETL tool Data stage with deep understanding of ETL concepts, ETL loading strategy, Data reconciliation, Error Handling standards.
- Expertise in the design, development, implementation and maintenance of Data Integration and Data Migration projects.
- Involved in performance tuning of Data Stage at stage level, job level.
- Expert in using SQOOP to import and export data from RDBMS to Hadoop and vice-versa
- Good knowledge in data transformations using Map-Reduce, HIVE and Pig scripts for different file formats.
- Hands on experience in dealing with Compression Codecs like Snappy, GZIP.
- Experience in successful implementation of ETL solution between OLTP and OLAP database in support of Decision Support System/Business Intelligence with expertise in all phases of SDLC.
- Good understanding of NoSQL databases and valuable experience in writing applications on NoSQL databases like HBase.
- Valuable experience in working on CDH4 and CDH5 Cloudera, MapR and HDP distributions.
- Hands on experience in application development using core JAVA, RDBMS and Linux shell scripting.
- Expertise in understanding and implementing Java technologies.
- Worthwhile experience writing software in continuous build and automated deployment environment.
- Goal oriented, organized, team player with good interpersonal skills thrives well within group environment as well as individually.
- Strong business and application analysis skills with excellent communication and professional abilities.
TECHNICAL SKILLS:
Big Data Framework: HDFS, MapReduce, YARN, Hive, Impala, Hue, Pig, SQOOP, Flume, Spark, Zookeeper, Oozie, Kafka, HBase, Storm.
Hadoop Distributions: Apache, Cloudera CDH5, Horton Works, MapR.
Fast Data Technologies: Kafka, Flume, Spark Streaming, AWS EMR.
RDBMS: MySQL, AWS cloud, Oracle, DB2, SQL, PostgreSQL, Teradata.
No SQL Databases: HBase, MongoDB.
IDE s: NetBeans, Eclipse
Languages/Scripting: Core and Advanced Java, Python, Scala, Pig Latin, HQL, SQL, PL/SQL, LINUX shell scripts, Java Script.
Programming language: Scala, Python, SQL, Java
Virtual Machines: VMWare, Virtual Box
OS: Cent OS 5.5, UNIX, LINUX, Windows XP/NT/7/8, Mac
File Formats: XML, Text, Sequence, RC, JSON, ORC, AVRO, and Parquet.
WORK EXPERIENCE:
Hadoop Developer
Confidential, New York, NY
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Used SQOOP to transfer data between Teradata and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
- Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Implemented Partitions, Buckets in HIVE for optimization.
- Implemented Hive optimized joins to gather data from various sources and run ad-hoc queries on top of them.
- Wrote Hive Generic UDF's to perform business logic operations at record level and table level.
- Worked on various file formats and compressions Text, Avro, Parquet file formats, snappy, GZIP compression.
- Developed workflow in OOZIE to automate the tasks of loading the data into HDFS and pre-processing with Pig, Hive, SQOOP.
- Implemented test scripts to support test driven development and continuous integration.
- Loading the Analyzed Hive data into NOSQL databases like HBase.
- Used Apache Kafka as messaging system to load log data, data from applications into HDFS system.
- Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark , with Hive and SQL.
- Scheduling and managing Cron jobs, wrote shell scripts to generate alerts.
- Involved in working with offshore team on daily and BI weekly sprint basis.
Environment: Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Spark, YARN, SQOOP, Flume, Kafka, Zookeeper, Cloudera, Oozie, UNIX Shell Scripting, Teradata.
Hadoop Developer
Confidential, New York, NY
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts.
- Involved in the process of data acquisition, data pre-processing and data exploration.
- As part of data acquisition, SQOOP and flume were used for incremental imports to inject the data from various sources into Hadoop file system.
- In pre-processing phase, we removed all the missing data and applied relevant transformations.
- In data exploration stage used hive and impala to get some insights about the customer data.
- Used Flume, SQOOP, Hadoop and Oozie for building data pipeline.
- Imported and exported data from HDFS to Hive using SQOOP
- Implemented job flows and monitored Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from various sources
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Extracted tables from MYSQL through SQOOP and placed in HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Assisted in exporting analyzed data to relational databases using SQOOP and Impala.
- Created and maintained Technical documentation for launching HADOOP Clusters, executing Hive queries and Pig Scripts.
- Used Oozie for scheduling workflows.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, SQOOP, Impala, HBase, Oozie, Flume, MYSQL, Windows, AWS S3, UNIX Shell Scripting, HDP .
Hadoop Developer
Confidential, Norwalk, CT
Responsibilities:
- Responsible to manage data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Visualize the HDFS data to customer using BI tool with the help of HIVE ODBC Driver.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked big data processing of clinical and non-clinical data using MapR.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Responsible for importing log files from various sources into HDFS using Flume.
- Created customized BI tool for manager team that perform Query analytics using HQL.
- Used Hive and Pig to generate BI reports.
- Imported data using SQOOP to load data from MySQL to HDFS on regular basis.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Data base data using SQOOP into Hive Dynamic partition tables using staging tables.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Experienced with different kind of compression techniques like LZO, GZIP, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.
Environment: Hadoop, HDFS, Map Reduce, SQOOP, Oozie, Pig, Hive, Flume, LINUX, MySQL, Java, Eclipse, MapR, Windows, UNIX Shell Scripting, and Eclipse.
Web Developer
Confidential, Meriden, CT
Responsibilities:
- Developing front-end screens using JSP, HTML and CSS.
- Developing modules for exceptions, utility classes, business delegate, and test cases using core Java.
- Developing SQL queries using MYSQL.
- Working with Eclipse using Maven plugin for Eclipse IDE.
- Writing Client Side validations using JavaScript.
- Extensively used jQuery for developing interactive web pages.
- Application was developed in Eclipse IDE and was deployed on Tomcat server.
Environment: Java/J2EE, Oracle, SQL, PL/SQL, JSP, Tomcat, HTML, AJAX, Java Script, JDBC, XML, UML, JUnit, Eclipse.