We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00 Rating

Hoffman Estate, IL

SUMMARY

  • Around 8+ years of total experience in IT industry which includes recent experience in Big Data Hadoop Ecosystem like Hive, Pig, Spark - streaming and data warehousing tool.
  • Experience with mapping, creation, designing, analysis, design implementation and support of application software based on client requirement, developing with n-Tier architecture solutions with distributed components of internet/intranet.
  • Around 4.5+ years of hands-on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Sqoop, Flume, Spark, Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
  • Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
  • Experience in analyzing and cleansing raw data using HiveQL, Pig Latin.
  • Knowledge in job work-flow/ sub work flows scheduling and monitoring tools like Oozie.
  • Experience in different Hadoop distributions like Cloudera 5.3(CDH4, CHD 5) and Horton Works Distributions (HDP).
  • Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver efficient results.
  • Hands on experience with the Spark SQL for complex data transformations using Scala programming language.
  • Experienced in using Kafka as a distributed publisher-subscriber messaging system.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
  • Experience in data transformations using Map-Reduce, HIVE, Pig scripts and Spark for different file formats.
  • Expertise in analyzing the data using HIVE and writing custom UDF's in JAVA for extended HIVE and PIG core functionality.
  • Hands on experience in using BI tools like Splunk/Hunk, Tableau.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Solid understanding of HDFS Designs, Daemons and HDFS high availability (HA).
  • Experience with various scripting languages like Linux/Unix shell scripts, Python.
  • Experience in understanding and managing Hadoop Log Files.
  • Experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Experienced with data warehousing and ETL processes.
  • Expert knowledge of data warehousing concepts, with hands-on in developing ETL applications in a dimensional data mart/data warehouse environment.
  • Involved in creating MVC architecture using java, validating files, Struts frame Work.
  • Participated in Review of Test Plan, Test Cases and Test Scripts prepared by system integration testing team.
  • Monitored the performance and identified performance bottlenecks in ETL code.
  • Strong experience in client interaction and understanding business application, business data flow and data relations.
  • Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.

TECHNICAL SKILLS

Scripting/ Languages: Shell Scripting, Unix script, Pig Latin, HiveQL.

Big Data Technologies: HDFS, MapReduce, Hive, Hue, Pig, Sqoop, Flume, Spark, Zookeeper, Oozie, Kafka, Impala, Hbase

RDBMS: MySQL, Oracle, Teradata, MSSQL

No Sql Databases: Hbase, Mark Logic, Cassandra

Programming language: Python, Scala, SQL, Java

IDE’s: NetBeans, Eclipse

Tools: Maven, Tableau

Virtual Machines: VMWare, Virtual Box

Hadoop Distributions: Cent OS 5.5, Unix, Red Hat Linux, Windows7, Debian, Kali

Software Methodologies: SDLC(water Fall)/Agile, Scrum

PROFESSIONAL EXPERIENCE

Confidential - Hoffman Estate, IL

Hadoop Developer

Responsibilities:

  • Worked with the source team to understand the format & delimiters of the data files.
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams.
  • Developed spark scripts by using Scala shell as per requirements.
  • Developed and implemented API services using Python in spark.
  • Troubleshoot and resolve data quality issues and maintain high level of data accuracy in the data being reported.
  • Extensively implemented POC’s on migrating to Spark-Streaming to process the live data.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Re-writing existing map-reduce jobs to use new features and improvements for achieving faster results.
  • Analyzes the large amount of data sets to determine optimal way to aggregate and report on it.
  • Performance tuned slow running resource intensive jobs.
  • Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, Json, CSV formats.
  • Hands on experience working on in-memory based application Apache Spark for ETL transformations.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to hive tables using different SerDe’s.
  • Setup Oozie work flow /sub work flow jobs for Hive/Sqoop/HDFS actions.
  • Experience in accessing Kafka cluster to consume data into Hadoop.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in confluence
  • Worked with team project manager on daily basis to complete project scheduling and admin tasks.
  • Handled tasks like maintaining accurate roadmap for project or certain product.
  • Monitoring the sprints, burndown charts and completing the monthly reports.

Environment: • Hive • SQL •Pig • Flume •Kafka • Map reduce • Sqoop • Scala •Python • Java •Shell Scripting •Unix Scripting •Spark • Teradata •Oracle •Oozie •Java •Cassandra

Confidential, Columbus, OH

HADOOP DEVELOPER

Responsibilities:

  • Worked on Spark SQL, Reading/Writing data from JSON file, text file, parquet file, Schema RDD.
  • Identifying data sources and create appropriate data ingestion procedures.
  • Transformed the data using Spark, Hive, Pig for BI team in order to perform visual analytics according to the client requirement.
  • Populate big data customer Marketing data structures.
  • Performed complex joins on tables in hive with various optimization techniques.
  • Implemented lateral view in conjunction with UDFs in Hive according to the client requirement.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Performed ETL Procedures on the data HDFS.
  • Connected hive and impala to tableau reporting tool and generated graphical reports.
  • Worked extensively with HIVE DDLS and Hive Query language(HQLs).
  • Developed PIG Latin for handling business transformations and Responsible writing PIG script and Hive queries for data processing.
  • Extend Hive and Pig core functionality by writing custom UDFs using Java.
  • Developed HBase tables to store variable data format of input data coming from different portfolios.
  • Involved in adding huge volume of data in rows and columns to store in Hbase.
  • Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries and Pig scripts.
  • Involved in moving all transaction files generated from various sources to HDFS for further processing through flume.
  • Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.

Environment: • Hive • SQL •Pig • Flume •Kafka • Map reduce • Sqoop •Python •Tableau • Java •Shell Scripting •Unix Scripting •Spark • Teradata •Oracle •Oozie •Java •Cassandra

Confidential - Columbus, OH

Data Warehouse Developer

Responsibilities:

  • Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
  • Sequencers are created at job level to include multiple jobs and a layer level sequence which include all job level sequences.
  • Loading Data into Dimension tables.
  • Extensively employed Data Stage Director to validate, run, schedule, monitor the jobs and followed job log carefully to debug the jobs.
  • Carefully monitored the performance statistics and involved in fine tuning of jobs for the improved processing time.
  • Involved in developing UNIX scripts to call Data stage jobs.
  • Responsible for integrating the Datastage jobs with ETL Control framework and Tivoli, scheduling.
  • Involved in fine tuning, trouble shooting, bug fixing, defect analysis and enhancement of the multiple admin systems Data stage jobs.
  • Involved in the designing of marts and dimensional and fact tables.

Environment: • Data stage 7.5 • Teradata •Main Frames • SQL •ETL

Confidential - Kalamazoo, MI

Jr.JAVA DEVELOPER

Responsibilities:

  • The application was developed in J2EE using an MVC based architecture.
  • Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
  • Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
  • Wrote prepared statements and called stored Procedures using callable statements in MySQL.
  • Executed SQL queries to perform crud operations on customer records.
  • Used Apache web sphere as the application server for deployment.
  • Used Web services for transmission of large blocks of XML data over HTTP.

Environment: • Java • JSP •Mysql • Struts •Tomcat Web Server • Html • XML • Eclipse •CSS

Confidential - Chicago, IL

Support Engineer

Responsibilities:

  • Developed and executed shell scripts to automate the jobs.
  • Wrote complex Hive queries and UDFs.
  • Worked on reading multiple data formats on HDFS using Hive using SerDe’s.
  • Wrote Pig scripts to run ETL jobs on the data in HDFS.
  • Used Hive to do analysis on the data and identify different correlations.
  • Written lots PIG UDF to process some complex data.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.

Environment: • Hive • SQL •Pig • Flume • MYSQL

We'd love your feedback!