We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • 7+ years of Industry experience in Data Analytics and Big Data Technologies.
  • 2.5+ years of Hadoop experience in design, development and deployment of Big Data applications involving Apache Hadoop Ecosystem Map/Reduce, HDFS, Hive, Cassandra, Hbase, Pig, Sqoop, Kafka, Spark, YARN, Zookeeper, Oozie and Hortonworks Talend
  • Hands on Experience in Hadoop Framework for developing Hadoop jobs using Hive, Pig, MapReduce, Sqoop, Kafka, HBase and Cassandra.
  • Excellent understanding and knowledge of Hadoop architecture and its various components: HDFS, Name Node, Node Manager, Resource Manger, Application Master, Job History Server, Data Node and Map Reduce.
  • Hands on Experience in importing & exporting Data to - from RDBMS/HDFS using SQOOP
  • Sound knowledge of Spark and its components: SparkCore, SparkSQL, Spark Streaming.
  • Experience designing Scala Applications to work with Spark and optimizing Hive Query performance
  • Excellent hands on experience in analyzing data using Pig Latin, HQL, HBase and Map Reduce programs in Scala
  • Good working knowledge on Flume and Kafka for ingesting data from various streaming sources
  • Sound knowledge of SQL, JDBC, Stored procedures and packages. Exposure to relational databases (Oracle, MySQL, DB2), and NoSQL databases (Cassandra and Hbase)
  • Experience working with Agile as well as Waterfall Software Development Life Cycle (SDLC) methodologies
  • Expertise in creating databases, users, tables, views, stored procedures, functions, joins and indexes in Oracle DB.
  • Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
  • Implemented Oozie for writing work flows and scheduling jobs. Written Hive queries for data analysis and to process the data for visualization
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Excellent ability to quickly master new concepts along with capability of working in group as well as independently
  • Exceptional skills in communication, time management, organization and resource management

TECHNICAL SKILLS

Hadoop core services: HDFS, Map Reduce, Yarn

Database: Oracle10g, DB2, MySQL, Netezza, Teradata

NoSQL Databases: Hbase, Cassandra

ETL Tools: Apache NiFi, Talend

Monitoring Tools: Cloudera Manager, Hue, Ambari

Hadoop Distribution Services: Cloudera

Hadoop Data Services: Hive, Pig, SQOOP, Spark

Hadoop Operational Services: Zookeeper, Oozie, Tidal

Operating Systems: Microsoft Windows, UNIX, LINUX

Languages: Scala, Python, Spark SQL, HQL, Pig Latin, SQL scripting, Java, Linux Shell Scripting

Build Tools: Maven

Development Tools: Eclipse, IntelliJ

Project Tools/Agile: Jira, Confluence, Bit bucket, Jenkins, SVN

Development Methodologies: Agile, Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Experience creating Hive tables, loading tables with data and aggregating data by writing Hive queries.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
  • Designed and Developed lots of support sq. queries for Data Analytics in Hive.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
  • Parsed JSON and XML files with Pig Loader functions and extracted insightful information from Pig Relations by providing a regex using the built-in functions in Pig.
  • Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimization of existing scripts
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed projects with Spark/Scala, Hive, Oozie on Cloudera Hadoop Distribution
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Ingested data from different data sources (RDBMS, NoSQL, Streaming data, HDFS), cleansed, transformed and validated collected data records using ingestion tools like Sqoop, and Talend storing back to the intended destination while automating jobs using Oozie
  • Created Hive tables, partitions and buckets for analyzing large volumes of data.
  • Created HIVE queries as per business requirements. Scheduling the Hive jobs, Sqoop jobs, Pig jobs using Oozie
  • Performed Schema design for Hive and optimized the Hive performance and configuration
  • Used Pig to do transformations, joins and aggregations before storing the data into HDFS
  • Migrated existing processes (multiple Hive queries run manually) to an automated platform (Oozie) in the client's environment by translating Hive queries to Pig queries and eventually setting file dependencies and time dependencies
  • Developed and built Hive tables with partitions for better performance tuned hive queries to get the best performance. Hive UDFs Implementation
  • Developed Spark Applications by using Scala/Python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Processed structured, semi structured and unstructured data sets
  • Analyzed the data to extract information about customers i.e. positive/negative reviews, page views, visit duration, most popular products on website using Spark in Scala

Environment: Hive, Hue, Sqoop, Pig, HBase, Kafka, Oracle, MySql, Unix, SQL server, Oozie, Spark, Talend, Cassandra, Scala, Python

Confidential, Chicago, IL

ETL Developer

Responsibilities:

  • Developed data transformation processes using Talend for Big Data (running on MapReduce, or tez engines) and Hive QL (HQL) to build and integrate the bank's credit risk data
  • Worked with tap, java, tamaraw, Travelex, tJoin, tPostjob, tReplicate, tLogCatcher, aggregation, tParallelize, tSendemail, tie, unique, tFlowToIterate, titrate To Flow, FileInputJSON, Mongo DBInput, MongoDBOutput, tWarn, tLogCatcher, tStatsCatcher …. etc.
  • Created complex Mapping to pull data from Source, apply transformations, and load data into Hive and HBase
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment
  • Processed data from the ingestion point into the enterprise data lake (EDL) until the star schema (dimensions and fact tables) as well as data delivery to the outbound processes
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement
  • Implemented Daily jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs
  • Responsible for performing extensive data validation using Hive
  • Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS
  • Responsible to tune ETL mappings, Workflows and underlying data model to optimize load and query Performance.
  • Worked closely with business team and generated the reports depending on the requirement.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Implemented business logic by writing Pig UDFs, Hive Generic UDF's in Scala, and used various UDFs from Piggybanks and other sources
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit

Confidential

Data Analyst

Responsibilities:

  • Analyze data of in-place procedures to find ways to improve operations
  • Wrote Ad-Hoc queries and Stored Procedures using SQL Server Management Studio
  • Wrote complex SQL queries using complex joins, grouping, aggregation, nested subqueries, cursors etc.
  • Identify deficiencies and areas for improvement and redesign
  • Converted general data into strategic business plans and continuous improvement suggestions for senior management
  • Streamlined the reporting process through tuning SQL scripts and automated the daily reporting process
  • Involves in data profiling to validate data quality issues for the critical data elements
  • Involves in user acceptance testing to ensure code in placed satisfies all requirements before it goes to production
  • Created reports in Excel and translated analyzed results to PowerPoint presentation.
  • Hands on conducting in-depth data analysis on the reports/dashboards to identify gaps
  • Prepare detailed reports of workflow research and improvements
  • Compare processes with industry standards and best practices
  • Create set company standards and reporting structures

We'd love your feedback!