We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

St Louis, MO


  • Over 9+ years of professional IT experience in Business Analysis, Design, Data Modeling, Development and Implementation of various client server and decision support system environments with focus on Big Data, Data Warehousing, Business Intelligence and Database Applications.
  • Over 3+ Years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP, NAGIOS, OOZIE, Flume Big Data and Big Data Analytics.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Experience in task automation using Oozie, cluster co - ordination through Pentaho and MapReduce job scheduling using Fair Scheduler.
  • Experience in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java.
  • Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera, Hortonworks and MapR.
  • Got experience in managing and reviewing Hadoop Log files.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
  • Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
  • Experience in working with cloud infrastructure like Amazon Web Services (AWS) and Rackspace.
  • Good Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.
  • Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
  • Experience in configuring Zookeeper to provide Cluster coordination services.
  • Good experience in performing minor and major upgrades.
  • Experience in benchmarking, performing backup and recovery of Namenode metadata and data residing in the cluster.
  • Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
  • Adept Confidential configuring NameNode High Availability.
  • Worked on Disaster Management with Hadoop Cluster.
  • Experienced in Linux Administration tasks like IP Management (IP Addressing, Subnetting, Ethernet Bonding and Static IP)
  • Experience in deploying and managing the multi-node development, testing and production.
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing.
  • Principles, generating key tab file for each and every service and managing key tab using key tab tools.
  • Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
  • Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
  • Expert knowledge in using various Transformation components such as Join, Lookup, Update, Router, Normalize, De-normalize, Partitioning and De-partitioning components etc.
  • Experience on maintaining Production databases.
  • Experience in SQL Server Installations & Uninstallations of 2000, 2005 & 2008.
  • Experience in Configuring & maintaining of Mirroring & Log Shipping.
  • SQL Server Performance Tuning and Maintenance.
  • Utilization of DMV's and DBCC commands for knowing Blocking issues, Fragmentation information, user information.
  • Experienced in all phases of Software Development Life Cycle (SDLC).
  • Able to interact effectively with other members of the Business Engineering, Quality Assurance, Users and other teams involved with the System Development Life cycle.
  • Expertise in installation, configuration, administration, troubleshooting, tuning, security, backup, recovery and upgrades of Redhat Linux 5/6, Solaris 8/9/10.


Hadoop: HDFS, MapReduce, CloudEra, HIVE, PIG, HBase, Sqoop, Oozie, Zookeeper, Spark, Kafka, Storm

Java Technologie: Java/J2EE - JSP, Servlets, JDBC, JSTL, EJB, Junit, RMI, JMS

Web Technologies: Ajax, JavaScript, JQuery, HTML, CSS, XML, Python

Programing Languages: Java, Scala, Python

Databases: MySQL, MS-SQL Server, SQL, Oracle 11g, NoSQL (HBase, MongoDB, Cassandra, Solr)

Web Services: REST, AWS, SOAP, GCP.

Tools: Ant, Maven, JUnit

Servers: Apache Tomcat, WebSphere, JBoss

IDE's: MyEclipse, Eclipse, IntelliJ IDEA, NetBeans, WSAD

Web/UI: HTML, Java Script, XML, SOAP, WSDL

ETL/BI Tools: Talend, Tableau


Confidential, St Louis, MO

Senior Hadoop Developer


  • Defined, designed and developed Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
  • Developed workflow using Oozie for running Map Reduce jobs and Hive Queries.
  • Worked on loading log data directly into HDFS using Flume.
  • Responsible for managing data from multiple sources.
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Load data from various data sources into HDFS using Flume.
  • Clusters and for executing Hive queries and Pig Scripts.
  • Installed Yarn (Resource Manager, Node manager, Application master) and created volumes and CLDB in edge nodes.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive,Spark, HBase, Kafka, Elastic Search, database and SQOOP. Installed Hadoop, Map Reduce, HDFS, and developed multiple Map-Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked on AWS to create, manage EC2 instances, and Hadoop Clusters.
  • Involved in loading data from LINUX file system to AWS S3 and HDFS.
  • Monitored already configured cluster of 54 nodes.
  • Installed and configured Hadoop components MaprFS, Hive, Impala, Pig, Hue.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in Mapr Control System (MCS).
  • Installed Oozie workflow engine to run multiple Hive Jobs.
  • Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioral data and financial histories into MapRFS for analysis.
  • Experience in Implementing High Availability of Name Node and Hadoop Cluster capacity planning to add and remove the nodes.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Created custom new columns depending up on the use case while ingesting the data into Hadoop lake using pyspark.
  • Analyzed the Sql scripts and designed solutions to implement using pyspark.
  • Installed and configured Hive, Hbase.
  • Configuring Sqoop and Exporting/Importing data into HDFS.
  • Configured NameNode high availability and NameNode federation.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Use of Sqoop to import and export data from MapRFS to Relational database and vice-versa.
  • Data analysis in running Hive queries.
  • Addressing and Troubleshooting issues on a daily basis.
  • Cluster maintenance as well as creation and removal of nodes.
  • Monitor Hadoop cluster connectivity and security, Manage and review MapR log files.
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.

Environment: HDFS, Scala, Python, CDH5, Hbase, NOSQL, RHEL 4/5/6, Hive, Pig, Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, MongoDB, AWS.

Confidential, Tallahassee, FL

Linux/Hadoop Engineer


  • Configuring, Maintaining, and Monitoring Hadoop Cluster using Ambari distribution. 44
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Enable High Availability Name Node, Resource manager, HBase and HiveServer2 automatic failover infrastructure to overcome single point of failure.
  • Used NiFi for automation of data movement between desperate data sources and systems, making data ingestion fast, easy and secure.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Installed MySQL database to store metadata.
  • Import and export data to hive tables and Hbase.
  • Commissioning and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.4
  • Good experience in troubleshoot production level issues in the cluster and its functionality.
  • Production jobs debugging when failed.
  • Creating queues on YARN queue manager to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for creating Hive tables based on business requirements
  • Experienced on adding/installation of new components and removal of them through Ambari.
  • Monitored workload, job performance and capacity planning using Ambari.
  • Participated in development and execution of system and disaster recovery processes.
  • Loaded data into NoSQL database HBase
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
  • Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
  • Discussions with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback.
  • Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.

Environment: HDFS HDP, Ambari, Hbase, NOSQL, Python, Yarn, Hive, Kerberos, Pig, Hadoop, HDFS, Pig, Sqoop, HBase, NiFi, Shell Scripting, Linux Red Hat.


Storage Systems Engineer


  • Providing on line support for Storage administration, SAN Fabric and Linux servers in a Data Center Production Environment
  • Configured System imager for Linux Servers and used it for Cloning, software distribution and OS updates.
  • Installations using PXE boot, Maintenances and troubleshooting hundreds of Linux Servers.
  • Worked on Solaris servers supporting all the OS administration, application and hardware issues.
  • Very good working experience on Redhat Satellite Server upgrading and patching.
  • Implemented upgrade of servers using Redhat Satellite Server.
  • Setup NFS file systems and shared them to clients.
  • Troubleshooting and configuring NAS and NFS mount points.
  • Troubleshooting network, application, and server related issues.
  • Scheduling of automatic repetitive Jobs, and Shell Scripts with Crontab.
  • Monitoring Virtual memory, Swap management, and Disk and CPU utilization by using various monitoring tools.
  • Implemented rapid provisioning and life cycle management for Redhat Linux using kickstart and puppet.
  • Improved Linux OS deployment and management by creating customized kickstart scripts and installing puppet.
  • Used puppet for central management of Linux configuration files and software package management.
  • Created RPM packages using RPMBUILD, verifying the new build packages and distributing the package.
  • Installed Red Hat Linux on Intel Machines, Configured File Systems and Raw - Devices.
  • Installed various RPMS on Linux.
  • Involved in mass migration of data form old EMC Storage subsystems to new Subsystems.
  • Creating volumes, LUNs and implementing different RAID levels on the same
  • Configured and created snapshots of clones of LUNs using EMC Snap View
  • Use HPSM (HP Service Manager) to manage changes and incidents.
  • Work closely with Storage and Network teams to ensure highest level of dependability across VMware infrastructure
  • Assisted in Migration Projects moving the servers from legacy environments to newly commissioned clusters in new virtual centers
  • Job also includes creating templates and deploying Virtual Machines through templates, cloning Virtual Machines and managing Virtual Center permissions
  • Creation of templates from VMs and Creation of VMs from templates
  • Interacts directly with executive level management for internal tasks and projects
  • Design and implement systems, network configurations, and network architecture, including hardware and software technology, site locations, and integration of technologies.
  • Test, maintain, and monitor computer programs and systems, including coordinating the installation of computer programs and systems.

Environment: EMC VMAX, Cisco MDS, EMC VPLEX, NetApp 7-mode and C-mode, IBM XIV, DS8K, DS4K, Flash Systems, ESX VMware, Linux, Solaris, AIX, Windows7/8/10/Server 2008.


Storage Administrator


  • Storage Configuration, Storage (LUN) Provisioning, SRDF / STAR Replication monitoring.
  • Switch administration port allocations and Zoning.
  • Troubleshoot and resolve the Storage related issues generated in the production, development and testing environments.
  • Problem analyzing and managing changes as per the Infrastructure.
  • Monitoring & Troubleshooting Data Centre updates for both firmware and hardware levels.
  • Troubleshooting SRDF link operational issues in the VMAX environment.
  • Maintenance of existing fiber channel switches, creating soft zones.
  • Handling hardware failures and replacements.
  • Performing partial and full reclamation operations on DMX, VMAX & IBM arrays.
  • Working proactively with vendor during Sev-1 and Sev-2 incidents and as well as hardware failures.
  • Monitoring and clearing the port stats on switches during the weekends.
  • Experience in Changes management and Incident management creation of changes for configuration changes and incidents for break fix issues.
  • Participate in CAB calls to represent severity 1 & 2 emergency changes.

Environment: 3 active data centers, VMAX 40K Enterprise Storage fully populated, EMC XtremIO, IBM DS8K, EMC SRDF, BCV, OneFS,, 8.1.0, Cisco MDS, ESX, VMware, Linux, Solaris, AIX Windows7/8/10/Server 2008/Server 2012/File Server 2012.

Hire Now