We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Los Angeles, CA

PROFESSIONAL SUMMARY:

  • Over 5+ years of IT experience including 4+ years of experience in Hadoop Development along with 1+ years of experience in Database Analyst.
  • Worked in various domains including luxury, telecommunication.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, and MapReduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop 2.x, MapReduce 2.x, HDFS, HBase, Oozie, Hive, Kafka, Oozie, Zookeeper, Spark, Storm, Sqoop and Flume.
  • Experience in analyzing data using HiveQL, HBase and custom MapReduce programs in Java.
  • Extended Pig and Hive core functionality by writing custom UDFs.
  • Wrote Ad - hoc queries for analyzing the data using HIVE QL.
  • Strong knowledge on creating and monitoring Hadoop clusters on VM, HortonWorks Data Platform 2.1 & 2.2, CDH5 Cloudera Manager, HDP on Linux, Ubuntu etc.
  • Excellent knowledge of multiple platforms such as Cloudera, Hortonworks, MapR etc.
  • Development level experience in Microsoft Azure, ASP.NET, ASP, and C #.NET, Web Services, WCF, ASP.NET Web API, ADO.NET, JavaScript, JQuery, AngularJS, Bootstrap, PowerShell, CSS, HTML, UML and XML.
  • Experience in database design and development using SQL Azure, Microsoft SQL Server, Microsoft Access.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Having good working Knowledge on Map Reduce Framework.
  • Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop cluster .
  • Good knowledge on Kafka, Active MQ and Spark Streaming for handling Streaming Data .
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Analyse data, interpret results and convey findings in a concise and professional manner
  • Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
  • Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
  • Good Exposure on Data Modelling, Data Profiling, Data Analysis, Validation and Metadata Management.
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Cent OS 5/6, Ubuntu 13/14.
  • Have sound knowledge on designing ETL applications with using Tools like Talend.
  • Developed real-time read write access to very large datasets via Hbase.
  • Experience in integration of various data sources in RDMS like Oracle, SQL Server.
  • Used NoSQL Database including Hbase, MongoDB, Cassandra.
  • Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.
  • Extracted data from log files and push into HDFS using Flume.
  • Scheduled workflow using Oozie workflow Engine.
  • Consolidated MapReduce jobs by implemented Spark, decreased data processing time.
  • Experienced in Agile and Waterfall methodologies.
  • Fluent in Data Mining and Machine Learning, such as classification, clustering, regression and anomaly detection.
  • Knowledge of Social Network and Graph Theory.Successfully working in fast-paced environment, both independently and in collaborative team environments.

TECHNICAL SKILLS:

Distributed File System, Distributed Programming: HDFS 2.6.0, MapReduce 2.6.x, Pig 0.12, Spark 1.3.Hadoop Library, NoSQL-DataBases Mahout, MLlib, HBase 0.98, MongoDB, Cassandra.

Relational Databases: Distribution based on Hadoop Oracle 11g/10g/9i/, MySQL 5.0, SQL Server, Cloudera Distribution (CDH4, CM).

SQLOn: Hadoop, Data Ingestion, ETL tools: Hive 0.12, Cloudera Impala 2.0.x, Flume 1.3.x, Sqoop 1.4.4, Storm 0.9, Kafka 0.8.

Languages, Scheduling: Java, Python, Scala, UNIX Shell Scripting,, Oozie 4.0.x, Falcon, SQL, C, C++

Service Programming, Tools: Zookeeper 3.3.6, Eclipse, Git, Maven, TableauOperation Systems, Methodologies: Linux (CentOS, Ubuntu), Mac OS, Windows, Agile, Waterfall.

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Los angeles, CA

Responsibilities:

  • Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Worked extensively on HIVE, Sqoop, PIG and Python .
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive .
  • Scheduled CRON JOB to schedule the shell scripts.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR etc) to fully implement and leverage new Hadoop features.
  • Involved in continuous monitoring and managing the Hadoop cluster using Hortonworks.
  • Installed APACHE NIFI and MINIFI to make data ingestion Fast, Easy and Secure from internet of anything with HORTONWORKS DATA FLOW and Configuring, Managing permissions for the users in hue
  • Worked with moving the VSAM files in mainframes to the Hadoop.
  • Developed PIG predefined functions to convert the fixed width file to delimited file.
  • Expertise in Data modeling for Data Warehouse/Data Mart development, Data Analysis for Online Transaction Processing (OLTP) and Data Warehousing (OLAP)/Business Intelligence (BI) applications.
  • Migrated Complex map reduce programs into in memory spark processing using Transformations and actions.
  • Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS .
  • Involved in data migration from one cluster to another cluster.
  • Responsible for defining the naming standards for data warehouse
  • Developed the required data warehouse model using Star schema for the generalized model
  • Worked on Talend for Loading and extracting data from Oracle and SQL.
  • Experienced in Data Validations and gathering the requirements from the business

Environment: Hadoop, MapReduce, Hortonworks Talend, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, ZooKeeper, AWS, Oracle, Python and UNIX.

Hadoop Developer

Confidential - Washington, DC

Responsibilities:

  • Installed, configured applications on development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High availability Cluster and integrating HIVE with existing applications
  • Experienced in managing Hadoop Jobs and logs of all the scripts.
  • Experienced in creating hive tables by getting the source field names and data types.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop .
  • Worked extensively with Sqoop for importing metadata from Oracle
  • Experience migrating MapReduce programs into Spark transformations using Spark and Scala
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning
  • Created HBase tables to store various data formats of PII data coming from different portfolios
  • Cluster co-ordination services through Zookeeper.
  • Push data as delimited files into HDFS using Talend Big data studio.
  • Working as Hadoop Developer and admin in Hortonworks (HDP 2242) distribution for 10 clusters ranges from POC to PROD
  • Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary
  • Installed and configured Hive and also written Hive UDFs in java and Python .
  • Helped with the sizing and performance tuning of the Cassandra cluster
  • Involved in the process of Cassandra data modelling and building efficient data structures.
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem
  • Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment And supporting and managing Hadoop Clusters using Apache, Horton works, Cloudera and MapReduce
  • Responsible for architecting Hadoop clusters .
  • Written shell scripts and Python scripts for automation of job
  • Assist with the addition of Hadoop processing to the IT infrastructure
  • Perform data analysis using Hive and Pig

Environment: Hadoop, MapReduce, Hortonworks, HDFS, Hive, Java, SQL, Cloudera Manager, Scala, Cassandra, Pig, Sqoop, Oozie, Zoo-Keeper, Teradata, PL/SQL, MySQL, Windows, Horton works, Oozie, HBase.

Big Data/ Hadoop Developer

Confidential, Jersey city, New Jersey

Responsibilities:

  • Installed, configured, monitored and maintained Hadoop cluster on Big Data platform.
  • Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
  • Wrote multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Used Pig UDFs to do data manipulation, transformations, joins and some pre-aggregations.
  • Created multiple Hive tables, implemented partitioning, dynamic partitioning and buckets in Hive for efficient data access.
  • Used Flume to collect, aggregate, and store dynamic web log data from different sources like web servers, mobile devices and pushed to HDFS.
  • Stored and fast update data in Hbase, provided key based access to specific data.
  • Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.
  • Configured Spark to optimize data process.
  • Applied MLlib to build statistical model to classify and predict.
  • Worked on Oozie workflow engine for job scheduling.
  • Created HDFS Snapshots in order to do data backup, protection against user errors and disaster recovery.

Environment: Hadoop 2.4.x, HDFS, MapReduce 2.4.0, YARN 2.6.2, Pig 0.14.0, Hive 0.13.0, HBase 0.94.0, Sqoop 1.99.2, Flume 1.5.0, Oozie 4.0.0, Zookeeper 3.4.2, Cassandra, MongoDB, Spark1.1.1, Kafka 0.8.1

Hadoop Developer

Confidential

Responsibilities:

  • Responsible for loading customer's data and event logs into HBase using Java API.
  • Created HBase tables to store variable data formats of input data coming from different portfolios
  • Involved in adding huge volumes of data in rows and columns to store data in HBase
  • Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
  • Involved in initiating and successfully completing Proof of Concept on Flume for Pre-Processing,
  • Used Flume to collect the log data from different resources and transfer the data type to Hive tables using different SerDes to store in JSON, XML and Sequence file formats.
  • Used Hive to find correlations between customer's browser logs in different sites and analyzed them.
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Created User accounts and given the users the access to the Hadoop Cluster.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
  • Developed the Pig UDF's to pre-process the data for analysis.
  • Loaded files to Hive and HDFS from MongoDB Solr.
  • Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on Hadoop cluster.
  • Responsible for using Oozie to control workflow.

Environment: Hadoop 2.0, HDFS, Pig 0.11, Hive 0.12.0, MapReduce 2.5.2, Sqoop, LINUX, Flume 1.94, Kafka 0.8.1, HBase 0.94.6, CDH4, Oozie 3.3.0

Big Data Analyst

Confidential

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools such as HiveQL.
  • Importing and exporting data in HDFS and Hive using Sqoop.
  • Extracted BSON files from MongoDB and placed in HDFS and processed.
  • Designed and developed MapReduce jobs to process data coming in BSON format.
  • Worked on the POC to bring data to HDFS and Hive.
  • Written Hive UDFs to extract data from staging tables.
  • Involved in creating Hive tables, loading with data.
  • Hands on writing MapReduce code to make unstructured data as structured data and for inserting data into.
  • Experience in creating integration between Hive and HBase.
  • Familiarized with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
  • Used Oozie scheduler to submit workflows.
  • Review QA test cases with the QA team.

Environment: Hadoop 1.2.1, Java JDK 1.6, MapReduce 1.x, Hbase 0.70, MySQL, MongoDB, Oozie 3.x

Database Developer/Admin

Confidential

Responsibilities:

  • Gather Requirements that are to be incorporated into the system.
  • Extensively worked on the analysis of Tables in both Legacy Data store and new data store.
  • Extensively worked on the analysis of Columns in mapping tables for both Legacy Data store and new data store.
  • Initialized utilization of Data Warehouse ETL software during conversion of data to Oracle DB.
  • Developed the complete documentation of the project based on the analysis of tables and Columns.
  • Created DDL scripts to create, alter, drop tables, views, synonyms and sequences.
  • Worked on SQL Tables, Records and Collections.
  • Wrote SQL Procedures, Functions, and Triggers for Insert, Update and Delete transactions and optimized for maximum performance.
  • Extensively worked on the Database Triggers, Stored Procedures, Functions and Database Constraints.
  • Developed SQL queries to fetch complex data from different tables in remote databases using database links.
  • Used ETL process to identify the new or the changed data in order to make better decisions in the project.
  • Participated in Performance Tuning of SQL queries using Explain Plan to improve the performance of the application.
  • Source data residing in Excel formats are exported to flat files and then accessed via Oracle external tables in order to load into the staging schema, at which point all source data can be efficiently transformed and migrated to the target schema.
  • Extracted data from Flat files using SQL*LOADER.
  • Developed Unix Shell Scripts for loading data into the database using SQL* Loader.
  • Created partitions on the tables to improve the performance.
  • Participated in application planning, design activities by interacting and collecting requirements from the end users.

Environment: Oracle 11g, SQL Developer, SQL Tuning, SQL*Loader, UNIX Shell Scripting

We'd love your feedback!