We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Pittsburgh, PA

SUMMARY

  • Over 8+ years of professional IT experience including 5+ years Big data ecosystem related technologies: Hadoop HDFS, Map - reduce, Pig, Hive, Oozie, Flume, Hcatalog, Sqoop, Zookeeper, NoSQL. Expertise in Big Data technologies as consultant, proven capability in project based teamwork and also as an individual developer with good communication skills.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in working with Hadoop clusters using Cloudera (CDH5) and HortonWorks Distributions.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce(MR), HDFS, HBase, Oozie, Hive, Sqoop, Spark, Kafka, Cassandra, Scala, Pig, Knox, SPARK STREAMING and Flume.
  • Hands-on development and implementation experience in Big Data Management Platform (BMP) using HDFS, Map Reduce, Hive, Pig, Oozie, Apache Kite and other eco-systems as a Data Storage and Retrieval systems.
  • Experience in developing web-based applications using Python-Django, PHP, XML, CSS, Bootstrap, HTML, JavaScript, jQuery, JSON and AJAX technology
  • Performed importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Extending Hive and Pig core functionality by writing UDFs.
  • Good experience installing, configuring, testing Hadoop ecosystem components.
  • Highly knowledgeable in Writer Comparable, Writer interfaces, Mapper and Reducer abstract classes, Hadoop Data Objects such as IntWritable, ByteWritable, Text objects.
  • Well-experienced Mapper, Reducer, Combiner, Partitioner, Shuffling and Sort process along with Custom Partitioning for efficient Bucketing.
  • Good experience in writing PIG and Hive UDF’s to solve the purpose of util classes.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in installation, configuration, supporting and managing - Cloud Era’s Hadoop platform along with CDH4&5 clusters.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Expertise in working with databases likes Oracle, MS-SQL Server, Postgres and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
  • Hands on experience in Agile and Scrum methodologies.
  • Extensive experience in working with the Customers to gather required information to analyze, provide data fix or code fix for technical problems, and providing Technical Solution documents for the users.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
  • Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
  • Expertise in Object-oriented analysis and programming(OOAD) like UML and use of various design patterns
  • Working knowledge in SQL, PL/SQL, Stored Procedures, Functions,Packages, DB Triggers and Indexes.
  • Experience in design the jobs,transformations,load data sequentially & parallel for initial and incremental loads.
  • Good experience in using various PDI / Kettle (Pentaho Data Integrator)steps in cleansing and load the data as per the business needs.
  • Experience in configuring Zookeeper to provide Cluster coordination services.
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC serv er setup, creating and managing the realm domain.

TECHNICAL SKILLS

Languages: C,C++,JAVA, Python, PERL, SQL and PL/SQL

Big Data Framework and Eco Systems: Hadoop, MapReduce, Hive, Pig, Kafka, Cassandra, HDFS, Zookeeper, Sqoop, Spark, Scala, Apache Crunch, Oozie and Flume

No SQL: Cassandra, HBase and MemBase

Web Technologies: JavaScript, CSS, HTML, XHTML, AJAX, XML, XSLT

Databases: Oracle 8i/9i/10g/11g, MySQL, PostGre SQL and MS-Access

Operating Systems: Windows XP/2000/NT, Linux (Red-Hat, CentOS), Machitosh, UNIX

Tools: Ant, Maven, TOAD, AgroUML, WinSCP, Putty, Lucene

IDE Tools: Eclipse 4.x, Eclipse RCP, NetBeans 6, Editplus

Version Control Tools: CVS, SVN

ETL Tools: PDI / Kettle (Pentaho Data Integration)

BI Tools: Pentaho, QlikView,Tableue, Informatica

WEB Servers: Apache Tomcat and Apache Http web server

PROFESSIONAL EXPERIENCE

Confidential, Pittsburgh, PA

Sr. Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Importing different log files using Apache Kafka into HDFS and performed data analytics using spark.
  • Involved in importing the data from various data sources into HDFS using Sqoop and applying various transformations using Hive, Spark and then loading data into Hive tables.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra .
  • Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Kafka.
  • Experience in developing Kafka consumers and Kafka producers by extending low level and high level consumer and producer API’s.
  • Involved in converting Hive/SQL queries into spark transformations using spark RDDs and python (pyspark).
  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Experience in developing various Spark Streaming API’s using python. (Pyspark).
  • Developing spark code using pyspark to applying various transformations and actions for faster data processing.
  • Working knowledge on Spark Streaming API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.
  • Used Spark Stream processing to get data into in-memory, implemented RDD transformations, and performed actions.
  • Developed a Python Django application dashboard to effectively track application errors of different sites from a single point.
  • Responsible for Design and maintenance of databases using Python. Developed Python based APIs (RESTful Web services) by using Flask, Django.
  • Developed various Kafka Producers and consumers for importing various transaction logs.
  • Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
  • Involved in integrating HBase with pyspark to import data into HBase and also performed some CRUD operations on HBase.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Experience in working with Elastic MapReduce (EMR) and setting up environments on amazon AWS EC2 instances.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Loaded and performed some transform data into Hadoop cluster from large set of structured data using Talend Big data studio.
  • Worked with different File Formats like text file, avro, orc for HIVE querying and processing based on business logic.
  • Experience in pulling the data from AWS Amazon S3 bucket to Data Lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Involved in writing Custom Talend jobs to ingest, enrich and distribute data in Hadoop ecosystem.
  • Implemented Hive, Pig UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Knowledge on Machine Learning algorithms like clustering, classification and regression.
  • Written multiple Map Reduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
  • Implemented various machine learning algorithm based on business logic using Spark MLLib.
  • Used Talend Open Studio for data integration and for data migration from various location across the business.
  • Integrated data quality plans as a part of ETL processes using Talend.
  • Experience in build scripts using Maven and did continuous system integrations like Jenkins.
  • Used JIRA for bug tracking and GIT for version control.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Cloudera, Map Reduce, HDFS, Pig, Scala, Hive, Sqoop, Spark, Kafka, Oozie, Java, Linux, Maven, HBase, Zookeeper, Kerberos, Tableau, Python, Talend Open studio, AWS .

Confidential, Dunedin, FL

Sr. Hadoop Developer

Responsibilities:

  • Responsible for loading the customer's data and event logs from MSMQ into HBase using Java API.
  • Created HBase tables to store variable data formats of input data coming from different portfolios.
  • Involved in adding huge volumes of data in columns to store data in HBase.
  • Used Sqoop for transferring data from HBase to HDFS and vice versa.
  • Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
  • Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
  • Used Flume to collect the log data from different resources and transfer the data type to hive tables using different SerDe’s to store in JSON, XML and Sequence file formats.
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie workflows.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
  • End-to-end performance tuning of Hadoop clusters and MapReduce routines against very large data sets.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on Hadoop cluster.
  • Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.

Environment: Hadoop (CDH4), Big Data, HDFS, Pig, Hive, Python, MapReduce, Sqoop, Cloudera manager, LINUX, FLUME, HBase, Pig, Hive

We'd love your feedback!