Hadoop Developer Resume
Offman Estate, IL
SUMMARY
- Around 3+ years of hands - on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Sqoop, Flume Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
- Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
- Experience in analyzing data using HiveQL, Pig Latin.
- Knowledge in job work-flow scheduling and monitoring tools like Oozie.
- Experience in different Hadoop distributions like Cloudera 5.3(CDH4, CHD 5) and Horton Works Distributions (HDP).
- Strong end-to-end experience in Hadoop Development.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Experience in data transformations using Map-Reduce, HIVE and Pig scripts for different file formats.
- Expertise in analyzing the data using HIVE and writing custom UDF's in JAVA for extended HIVE and PIG core functionality.
- Hands on experience in configuring and administering the Hadoop Cluster.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Experience with various scripting languages like Linux/Unix shell scripts, Python.
- Experience in understanding and managing Hadoop Log Files.
- Experience in managing the Hadoop infrastructure with Cloudera Manager.
- Experienced with data warehousing and ETL processes.
- Expert knowledge of data warehousing concepts, with hands-on in developing ETL applications in a dimensional data mart/data warehouse environment.
- Involved in creating MVC architecture using java, validating files, Struts frame Work.
- Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
- Monitored the performance and identified performance bottlenecks in ETL code.
- Strong experience in client interaction and understanding business application, business data flow and data relations.
- Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
SKILL SET- TECHNICAL SKILLS:
Script Languages: Shell Scripting, Python, Unix script
Big Data Technologies: HDFS, MapReduce, Hive, Hql, Pig, Sqoop, Flume, Spark, Zookeeper, Oozie, Kafka
RDBMS: MySQL, Oracle, Teradata, MSSQL
Programming language: Python, SQL, Java
IDE’s: NetBeans, Eclipse
Tools: Maven
Virtual Machines: VMWare, Virtual Box
OS: Cent OS 5.5, Unix, Red Hat Linux,Windows7,Debian, Kali
WORK EXPERIENCE
Hadoop Developer
Confidential - Hoffman Estate, IL
Responsibilities:
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Python.
- Developed and executed shell scripts to automate the jobs.
- Wrote complex Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS using PySpark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using PySpark.
- Involved in loading data from UNIX file system to HDFS
- Extracted the data from Teradata into HDFS using Sqoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
- Spark and loaded data into HDFS.
- Manage and review Hadoop log files.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Used Kafka to consume data into Hadoop.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both managed and External tables in Hive to optimize performance.
- Worked on Hive context and SQLContext of spark extensively.
- Experienced in running Hadoop streaming jobs to process terabytes of data.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Environment: Hadoop, HDFS, Hive, Python, Spark, SQL, Teradata, Yarn, Sqoop, Kafka, UNIX Shell Scripting.
HADOOP DEVELOPER
Confidential, Columbus, OH
Responsibilities:
- Worked on Spark SQL, Spark Streaming, Reading/Writing data from JSON file, text file, parquet file, Schema RDD.
- Worked extensively with HIVE DDLS and Hive Query language(HQLs).
- Developed PIG Latin for handling business transformations.
- Responsible writing PIG script and Hive queries for data processing.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION.
- Involved in implementing the job workflows and scheduling for the end to end application processing.
- Written spark programs in Python and ran spark jobs on YARN.
- Worked with HBase databases for non-relational data storage and retrieval on enterprise use cases.
- Wrote Map Reduce jobs using Java API and Pig Latin.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Issued SQL queries via Impala to process the data stored in HDFS and HBase.
- Involved in developing Impala scripts for extraction, transformation, loading of data in to data warehouse.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Used Hive to do analysis on the data and identify different correlations.
- Written lots PIG UDF to process some complex data.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Configured MySQL Database to store Hive metadata.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Written Hive queries for data analysis to meet the business requirements.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Involved in creating Hive tables and working on them using Hive QL.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Spark, Yarn, Sqoop, Flume, Zookeeper, CDH 5.4, Oozie, ETL, MYSQL, agile, Windows, UNIX Shell Scripting, Teradata.
Data Stage Developer
Confidential - Columbus, OH
Responsibilities:
- Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
- Sequencers are created at job level to include multiple jobs and a layer level sequence which include all job level sequences.
- Extensively employed Data Stage Director to validate, run, schedule, monitor the jobs and followed job log carefully to debug the jobs.
- Carefully monitored the performance statistics and involved in fine tuning of jobs for the improved processing time.
- Involved in developing UNIX scripts to call Data stage jobs.
- Involved in fine tuning, trouble shooting, bug fixing, defect analysis and enhancement of the multiple admin systems Data stage jobs.
- Involved in the designing of marts and dimensional and fact tables.
Environment: Data stage 7.5, Teradata, Mainframe system.
JAVA DEVELOPER
Confidential - Kalamazoo, MI
Responsibilities:
- The application was developed in J2EE using an MVC based architecture.
- Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
- Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
- Wrote prepared statements and called stored Procedures using callable statements in MySQL.
- Executing SQL queries to check the customer records are updated appropriately.
- Used Apache Tomcat as the application server for deployment.
- Used Web services for transmission of large blocks of XML data over HTTP.
Environment: Java/J2EE, JSP, MySQL, Struts 1.3, Apache Tomcat, Eclipse, XML.
