We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Offman Estate, IL

SUMMARY

  • Around 3+ years of hands - on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Sqoop, Flume Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
  • Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
  • Experience in analyzing data using HiveQL, Pig Latin.
  • Knowledge in job work-flow scheduling and monitoring tools like Oozie.
  • Experience in different Hadoop distributions like Cloudera 5.3(CDH4, CHD 5) and Horton Works Distributions (HDP).
  • Strong end-to-end experience in Hadoop Development.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Experience in data transformations using Map-Reduce, HIVE and Pig scripts for different file formats.
  • Expertise in analyzing the data using HIVE and writing custom UDF's in JAVA for extended HIVE and PIG core functionality.
  • Hands on experience in configuring and administering the Hadoop Cluster.
  • Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
  • Experience with various scripting languages like Linux/Unix shell scripts, Python.
  • Experience in understanding and managing Hadoop Log Files.
  • Experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Experienced with data warehousing and ETL processes.
  • Expert knowledge of data warehousing concepts, with hands-on in developing ETL applications in a dimensional data mart/data warehouse environment.
  • Involved in creating MVC architecture using java, validating files, Struts frame Work.
  • Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
  • Monitored the performance and identified performance bottlenecks in ETL code.
  • Strong experience in client interaction and understanding business application, business data flow and data relations.
  • Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.

SKILL SET- TECHNICAL SKILLS:

Script Languages: Shell Scripting, Python, Unix script

Big Data Technologies: HDFS, MapReduce, Hive, Hql, Pig, Sqoop, Flume, Spark, Zookeeper, Oozie, Kafka

RDBMS: MySQL, Oracle, Teradata, MSSQL

Programming language: Python, SQL, Java

IDE’s: NetBeans, Eclipse

Tools: Maven

Virtual Machines: VMWare, Virtual Box

OS: Cent OS 5.5, Unix, Red Hat Linux,Windows7,Debian, Kali

WORK EXPERIENCE

Hadoop Developer

Confidential - Hoffman Estate, IL

Responsibilities:

  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Python.
  • Developed and executed shell scripts to automate the jobs.
  • Wrote complex Hive queries and UDFs.
  • Worked on reading multiple data formats on HDFS using PySpark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Involved in loading data from UNIX file system to HDFS
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Spark and loaded data into HDFS.
  • Manage and review Hadoop log files.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Used Kafka to consume data into Hadoop.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both managed and External tables in Hive to optimize performance.
  • Worked on Hive context and SQLContext of spark extensively.
  • Experienced in running Hadoop streaming jobs to process terabytes of data.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.

Environment: Hadoop, HDFS, Hive, Python, Spark, SQL, Teradata, Yarn, Sqoop, Kafka, UNIX Shell Scripting.

HADOOP DEVELOPER

Confidential, Columbus, OH

Responsibilities:

  • Worked on Spark SQL, Spark Streaming, Reading/Writing data from JSON file, text file, parquet file, Schema RDD.
  • Worked extensively with HIVE DDLS and Hive Query language(HQLs).
  • Developed PIG Latin for handling business transformations.
  • Responsible writing PIG script and Hive queries for data processing.
  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION.
  • Involved in implementing the job workflows and scheduling for the end to end application processing.
  • Written spark programs in Python and ran spark jobs on YARN.
  • Worked with HBase databases for non-relational data storage and retrieval on enterprise use cases.
  • Wrote Map Reduce jobs using Java API and Pig Latin.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Issued SQL queries via Impala to process the data stored in HDFS and HBase.
  • Involved in developing Impala scripts for extraction, transformation, loading of data in to data warehouse.
  • Used Flume to collect, aggregate and store the web log data onto HDFS.
  • Wrote Pig scripts to run ETL jobs on the data in HDFS.
  • Used Hive to do analysis on the data and identify different correlations.
  • Written lots PIG UDF to process some complex data.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Configured MySQL Database to store Hive metadata.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Written Hive queries for data analysis to meet the business requirements.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Involved in creating Hive tables and working on them using Hive QL.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Spark, Yarn, Sqoop, Flume, Zookeeper, CDH 5.4, Oozie, ETL, MYSQL, agile, Windows, UNIX Shell Scripting, Teradata.

Data Stage Developer

Confidential - Columbus, OH

Responsibilities:

  • Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
  • Sequencers are created at job level to include multiple jobs and a layer level sequence which include all job level sequences.
  • Extensively employed Data Stage Director to validate, run, schedule, monitor the jobs and followed job log carefully to debug the jobs.
  • Carefully monitored the performance statistics and involved in fine tuning of jobs for the improved processing time.
  • Involved in developing UNIX scripts to call Data stage jobs.
  • Involved in fine tuning, trouble shooting, bug fixing, defect analysis and enhancement of the multiple admin systems Data stage jobs.
  • Involved in the designing of marts and dimensional and fact tables.

Environment: Data stage 7.5, Teradata, Mainframe system.

JAVA DEVELOPER

Confidential - Kalamazoo, MI

Responsibilities:

  • The application was developed in J2EE using an MVC based architecture.
  • Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
  • Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
  • Wrote prepared statements and called stored Procedures using callable statements in MySQL.
  • Executing SQL queries to check the customer records are updated appropriately.
  • Used Apache Tomcat as the application server for deployment.
  • Used Web services for transmission of large blocks of XML data over HTTP.

Environment: Java/J2EE, JSP, MySQL, Struts 1.3, Apache Tomcat, Eclipse, XML.

We'd love your feedback!