We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

IL

SUMMARY

  • 7+ years of Industry experience in Data Analytics and Big Data Technologies.
  • 2.5+ years of Hadoop experience in design, development and deployment of Big Data applications involving Apache Hadoop Ecosystem Map/Reduce, HDFS, Hive, Cassandra, Hbase, Pig, Sqoop, Kafka, Spark, YARN, Zookeeper, Oozie and Hortonworks Talend
  • Hands on Experience in Hadoop Framework for developing Hadoop jobs using Hive, Pig, MapReduce, Sqoop, Kafka, HBase and Cassandra.
  • Excellent understanding and noledge of Hadoop architecture and its various components: HDFS, Name Node, Node Manager, Resource Manger, Application Master, Job History Server, Data Node and Map Reduce.
  • Hands on Experience in importing & exporting Data to - from RDBMS/HDFS using SQOOP
  • Sound noledge of Spark and its components: SparkCore, SparkSQL, Spark Streaming.
  • Experience designing Scala Applications to work wif Spark and optimizing Hive Query performance
  • Excellent hands on experience in analyzing data using Pig Latin, HQL, HBase and Map Reduce programs in Scala
  • Good working noledge on Flume and Kafka for ingesting data from various streaming sources
  • Sound noledge of SQL, JDBC, Stored procedures and packages. Exposure to relational databases (Oracle, MySQL, DB2), and NoSQL databases (Cassandra and Hbase)
  • Experience working wif Agile as well as Waterfall Software Development Life Cycle (SDLC) methodologies
  • Expertise in creating databases, users, tables, views, stored procedures, functions, joins and indexes in Oracle DB.
  • Experience in importing and exporting teh different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
  • Implemented Oozie for writing work flows and scheduling jobs. Written Hive queries for data analysis and to process teh data for visualization
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using teh in-memory computing capabilities written in Scala
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Excellent ability to quickly master new concepts along wif capability of working in group as well as independently
  • Exceptional skills in communication, time management, organization and resource management

TECHNICAL SKILLS

Hadoop core services: HDFS, Map Reduce, Yarn

Database: Oracle10g, DB2, MySQL, Netezza, Teradata

NoSQL Databases: Hbase, Cassandra

ETL Tools: Apache NiFi, Talend

Monitoring Tools: Cloudera Manager, Hue, Ambari

Hadoop Distribution Services: Cloudera

Hadoop Data Services: Hive, Pig, SQOOP, Spark

Hadoop Operational Services: Zookeeper, Oozie, Tidal

Operating Systems: Microsoft Windows, UNIX, LINUX

Languages: Scala, Python, Spark SQL, HQL, Pig Latin, SQL scripting, Java, Linux Shell Scripting

Build Tools: Maven

Development Tools: Eclipse, IntelliJ

Project Tools/Agile: Jira, Confluence, Bit bucket, Jenkins, SVN

Development Methodologies: Agile, Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Experience creating Hive tables, loading tables wif data and aggregating data by writing Hive queries.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
  • Designed and Developed lots of support sq. queries for Data Analytics in Hive.
  • Worked wif teh Spark for improving performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
  • Parsed JSON and XML files wif Pig Loader functions and extracted insightful information from Pig Relations by providing a regex using teh built-in functions in Pig.
  • Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimization of existing scripts
  • Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed projects wif Spark/Scala, Hive, Oozie on Cloudera Hadoop Distribution
  • Collaborated wif teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Ingested data from different data sources (RDBMS, NoSQL, Streaming data, HDFS), cleansed, transformed and validated collected data records using ingestion tools like Sqoop, and Talend storing back to teh intended destination while automating jobs using Oozie
  • Created Hive tables, partitions and buckets for analyzing large volumes of data.
  • Created HIVE queries as per business requirements. Scheduling teh Hive jobs, Sqoop jobs, Pig jobs using Oozie
  • Performed Schema design for Hive and optimized teh Hive performance and configuration
  • Used Pig to do transformations, joins and aggregations before storing teh data into HDFS
  • Migrated existing processes (multiple Hive queries run manually) to an automated platform (Oozie) in teh client's environment by translating Hive queries to Pig queries and eventually setting file dependencies and time dependencies
  • Developed and built Hive tables wif partitions for better performance tuned hive queries to get teh best performance. Hive UDFs Implementation
  • Developed Spark Applications by using Scala/Python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Processed structured, semi structured and unstructured data sets
  • Analyzed teh data to extract information about customers me.e. positive/negative reviews, page views, visit duration, most popular products on website using Spark in Scala

Environment: Hive, Hue, Sqoop, Pig, HBase, Kafka, Oracle, MySql, Unix, SQL server, Oozie, Spark, Talend, Cassandra, Scala, Python

Confidential, IL

ETL Developer

Responsibilities:

  • Developed data transformation processes using Talend for Big Data (running on MapReduce, or tez engines) and Hive QL (HQL) to build and integrate teh bank's credit risk data
  • Worked wif tap, java, tamaraw, Travelex, tJoin, tPostjob, tReplicate, tLogCatcher, aggregation, tParallelize, tSendemail, tie, unique, tFlowToIterate, titrate To Flow, FileInputJSON, Mongo DBInput, MongoDBOutput, tWarn, tLogCatcher, tStatsCatcher …. etc.
  • Created complex Mapping to pull data from Source, apply transformations, and load data into Hive and HBase
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining teh health and performance of teh ETL environment
  • Processed data from teh ingestion point into teh enterprise data lake (EDL) until teh star schema (dimensions and fact tables) as well as data delivery to teh outbound processes
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement
  • Implemented Daily jobs dat automate parallel tasks of loading teh data into HDFS and pre-processing wif Pig using Oozie co-coordinator jobs
  • Responsible for performing extensive data validation using Hive
  • Worked wif SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS
  • Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance.
  • Worked closely wif business team and generated teh reports depending on teh requirement.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Implemented business logic by writing Pig UDFs, Hive Generic UDF's in Scala, and used various UDFs from Piggybanks and other sources
  • Involved in loading teh created HFiles into HBase for faster access of large customer base wifout taking Performance hit

Confidential

Data Analyst

Responsibilities:

  • Analyze data of in-place procedures to find ways to improve operations
  • Wrote Ad-Hoc queries and Stored Procedures using SQL Server Management Studio
  • Wrote complex SQL queries using complex joins, grouping, aggregation, nested subqueries, cursors etc.
  • Identify deficiencies and areas for improvement and redesign
  • Converted general data into strategic business plans and continuous improvement suggestions for senior management
  • Streamlined teh reporting process through tuning SQL scripts and automated teh daily reporting process
  • Involves in data profiling to validate data quality issues for teh critical data elements
  • Involves in user acceptance testing to ensure code in placed satisfies all requirements before it goes to production
  • Created reports in Excel and translated analyzed results to PowerPoint presentation.
  • Hands on conducting in-depth data analysis on teh reports/dashboards to identify gaps
  • Prepare detailed reports of workflow research and improvements
  • Compare processes wif industry standards and best practices
  • Create set company standards and reporting structures

We'd love your feedback!