We provide IT Staff Augmentation Services!

Hadoop Admin Resume

4.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Overall 8 years of IT experience in design, implementation, troubleshooting and maintenance of complex Enterprise Infrastructure.
  • 4+ years of hands - on experience in installing, patching, upgrading and configuring Linux based operating system - RHEL and CentOS in a large set of clusters.
  • 4+ years of experience in configuring, installing, benchmarking and managing Apache Hortonworks and Cloudera distribution of Hadoop.
  • Experienced in using various programming languages like Python, C and C++.
  • 3+ years of extensive hands-on experience in IP network design, network integration, deployment and troubleshooting.
  • Performed troubleshooting, fixed and deployed many Pythonbug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
  • Expertise on using Amazon AWS API tools like: Linux Command line, Puppet integrated AWS API tools
  • Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
  • Experience in installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
  • Experience in designing and implementation of secure Hadoopcluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
  • Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
  • Experience in managing Hadoopcluster using Cloudera Manager and Apache Ambari.
  • Experience in using Zookeeper for coordinating the distributed applications.
  • Experience in developing PIG and HIVE scripting for data processing on HDFS.
  • Excellent knowledge of Perl Python and Ruby scripting languages
  • Experience in scheduling jobs using OOZIE workflow.
  • Experience in configuring, installing, managing and administrating HBase clusters.
  • Experience in managing Hadoopresource using Static and Dynamic Resource Pools.
  • Experience in setting up tools like Ganglia and Nagios for monitoring Hadoopcluster.
  • Importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Experience in installing minor patches and upgrading HadoopCluster to major version.
  • Experience in designing, installing and configuring VMware ESXi, within vSphere 5 environment with Virtual Center management, Consolidated Backup, DRS, HA, vMotion and VMware Data.
  • Experience in designing and building disaster recovery plan for Hadoop Cluster to provide business continuity.
  • Experience in configuring Site-to-site and remote access VPN using ASA firewalls.
  • Extensive Experience of Operating Systems including Windows, Ubuntu, Red Hat, Cent OS and Mac OS.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of professional.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop 2.2, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Yarn, Spark, Kafka, Storm, Flume and mahout

Hadoop Management &Security: Hortonworks, Cloudera Manager, Ubuntu

Web Technologies: HTML, XHTML, XML, XSL, CSS, JavaScript

Server Side Scripting: Shell, Perl, Python

Database: Oracle 10g, MySQL, DB2,SQL, RDBMS

Programming Languages: Java, SQL,PL/SQL, Python

Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBMWebSphere 6.0/ 5.1.1

NO SQL Databases: HBase, Mongo DB

OS/Platforms: Mac OS X 10.9.5, Windows, Linux, Unix, Cent OS

Virtualization: VMware, ESXI, vSphere, VCenter Server

SDLC Methodology: Agile (SCRUM), Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Hadoop Admin

Responsibilities:

  • Performed both Major and Minor upgrades to the existing cluster and rolling back to the previous version.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Implemented NFS, NAS and HTTP servers on Linux servers.
  • Experienced in cron jobs using Perl python.
  • Created a local YUM repository for installing and updating packages.
  • Dumped the data from one cluster to other cluster by using DISTCP and automated the dumping procedure using shell scripts.
  • Designed the shell/Python script in Linux for backing up of important metadata.
  • HA implementation of Name Node to avoid single point of failure.
  • Implemented Name node backup using NFS. This was done for High availability.
  • Supported Data Analysts in running Map Reduce Programs.
  • Implemented AWS solutions using EC2, S3 and load balancers.
  • Installed application on AWS EC2/EMR instances and configured the storage on S3 buckets.
  • Worked on analyzing data with Hive and Pig.
  • Running cron-tab to back up data.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Configured Oozie for workflow automation and coordination.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Client used Hortonworks distribution of Hadoopto store and process their huge data generated from different enterprises.
  • Maintained, audited and built new clusters for testing purposes using the AMBARI, HORTONWORKS
  • Involved in implementing security on Hortonworks Hadoop cluster with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
  • Responsible for upgrading Hortonworks HDP 2.2 and Mapreduce 2.0 with yarn in Multi Clustered Node environment.
  • Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured IPTABLES rules to allow the connection of application servers to the cluster and also setup NFS exports list and blocked unwanted ports.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
  • Responsible to manage data coming from different sources.

Environment: Map Reduce, HDFS, Hive, Pig, Flume, Sqoop, UNIX Shell Scripting, Nagios, Kerberos.

Confidential, Chicago, IL

Hadoop admin

Responsibilities:

  • Installation, configuration, deployment, maintenance, monitoring and troubleshooting Ambari clusters in environments such as Development and Production.
  • Commissioning and Decommissioning of Nodes from time to time in cluster.
  • Solo responsible for everything related to clusters starting from maintaining, monitoring and keeping up the cluster all the time by supporting 24/7 to support business without any outages.
  • Involves in Technical Workflow discussion and documentation, Mapping document and Data Warehouse designing
  • Worked for Name node recovery and Balancing HadoopCluster
  • Upgraded Cloudera Hadoop from CDH4.3 to CDH5.3 and applying patches
  • Worked on Data capacity planning, nodes forecasting and determining the correct hardware and infrastructure for cluster.
  • Responsible for managing and scheduling jobs on a Hadoop Cluster.
  • Configuration of Oozie Workflow Scheduler and Testing sample jobs.
  • Monitoring all daemons, cluster health status on daily basis and tuning system performance related configuration parameters, backing up configuration xml files.
  • Implemented Fair Scheduler to share the resources of the cluster for the Map Reduce jobs run by the users.
  • Good experience with HadoopEcosystem components such as Hive, HBase, Pig, Sqoop and Optimizing performance of Jobs.
  • Import data using Sqoop to load data from Oracle Server/MySQL Server to HDFS on regular basis.
  • Troubleshooting, Manage and review data backups, Manage and review Hadooplog files.
  • Assisted with Automatic Deployment using CHEF.
  • Implemented security for Hadoop Cluster with Kerberos.
  • Experience in LDAP integration with Hadoop and access provisioning for secured cluster.
  • Work with Hadoop developers, designers in troubleshooting Map Reduce job failures and issues.
  • Work with network and system engineers to define optimum network configurations, server hardware and operating system.
  • Provided on Call Support to fix the various Production issues on the fly to provide smooth running of jobs in peak time.

Environment: s/Tools: CDH5.3 and CDH4.3, ApacheHadoop2.x, HDFS, YARN, Map Reduce, Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Fair Scheduler, LDAP, Kerberos, Oracle Server, MySQL Server, Elasticsearch, CHEF Automation, Core Java,Linux, Bash scripts

Confidential, NC

Hadoop Developer

Responsibilities:

  • Install hadoop2/ Yarn, spark, Scala IDE, JAVA JRE on three machines. Configure these machines as a cluster and set one Name node and two Data nodes
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Implemented AWS solutions using EC2, S3 and load balancers.
  • Installed application on AWS EC2/EMR instances and configured the storage on S3 buckets.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data.
  • Setup and monitored Ambari cluster cluster with Hadoop2/ YARN running to read data from the Cluster
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Used Ganglia to Monitor and Nagios to send alerts about the cluster around the clock.
  • Involved in creating Hadoopstreaming jobs using Python.
  • Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
  • Worked on various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
  • Developed multiple MapReduce jobs in java for data cleaning.
  • Developed Hive UDF to parse the staged raw data to get the Hit Times of the claims from a specific branch for an insurance type code.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
  • Ran many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
  • Built wrapper shell scripts to hold Oozie workflow.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on MapReduce Joins in querying multiple semi-structured data as per analytic needs.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Developed POC for Apache Kafka.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Familiarity with NoSQL databases including HBase, MongoDB.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Hadoop 2, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, Zookeeper, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux, Kafka, Amazon web services.

Confidential

Linux Admin with Hadoop

Responsibilities:

  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Implemented Rack Awareness for data locality optimization.
  • Pro-actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
  • Administration of RHEL and AS 3, 4, 5 and 6 which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Involved in applying Patches, Installing patch bundles on HP-UX and Red hat Linux.
  • Performed automated installations of Operating System using Kick start for Linux.
  • Installed application on AWS RedShift and RDS instances.
  • RPM and YUM package installations, patch and another server management.
  • Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
  • Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, user and group quota.
  • Adding SAN using multipath and creating physical volumes, volume groups, logical volumes.
  • Planning and implementing system upgrades including hardware, operating system and periodical patch upgrades.
  • Monitoring & troubleshooting with performance related issues
  • Manages system security by restricting access to servers and maintaining data integrity.
  • Maintains system performance via management of virtual memory, swap space and CPU utilization.
  • Administrates system storage resources by implementing LVM.
  • Manages system processes and services using chkconfig utilities (SMF).
  • Monitors alert logs; backs up files as required using local archival utilities like tar; schedules processes with the cron utility; works with management to define requests for stored information.
  • Troubleshoots network configuration and connectivity issues
  • Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Worked on Recovery of Node failure.
  • Performed Hadoop Upgrade activities.
  • Managed and scheduling Jobs on a Hadoop cluster.
  • Installed and configured Kerberos for the authentication of users and Hadoop daemons.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked with support teams to resolve performance issues.
  • Worked on testing, implementation and documentation.

Environment: RHEL 6.x/5.x, Logical Volume Manager, Hadoop 1.2.0/, HDFS, Map Reduce, Hive, Pig, Sqoop, LDAP, Bash Scripting.

Confidential

Linux/Unix Administrator

Responsibilities:

  • Installation, Configuration & Migration of UNIX and Linux operating systems
  • Manage, maintain and fine tune clustered Apache Tomcat server configuration.
  • Installation Packages and patches.
  • Installed and configured Ubuntu troubleshooting hardware, operating system, applications & network problems and performance issues
  • Worked closely with the development and operations organizations to implement the necessary tools and process to support the automation of builds, deployments, testing of infrastructure.
  • Installed and configured various flavors of Linux like Red hat, SUSE and Ubuntu.
  • Monitored trouble ticket queue to attend user and system calls, participated in team meetings, change control meetings to update installation progress, and for upcoming changes.
  • Diagnosing and resolving systems related tasks in accordance with priorities setup in dealing with trouble tickets.
  • Deployed patches for Linux and application servers, Red Hat Linux Kernel Tuning.
  • Network trouble shooting using 'netstat', 'ifconfig', 'tcpdump', 'vmstat', 'iostat'.
  • Managed cron jobs, batch processing and job scheduling.
  • Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users.
  • Troubleshooting application issues on Apache web servers and database servers running on Linux and solaris.
  • Performed the manual backups of Database, software and OS using tar, cpio, mksysb.
  • Manage file system utilization using script scheduled as a cron job
  • Performed automation with simple shell scripting
  • Monitoring backup using Backup Exac, regularly monitored Alert-log Files and trace files on the day-to-day basis.
  • Monitoring system performance, Server load and bandwidth issues.
  • Regularly manage backup process for Server and Client data.

Environment: LINUX/UNIX, ORACLE DB2, SQL SERVER.

We'd love your feedback!