- A dedicated, and result driven professional seeking the challenging role as a Hadoop Administration where I can leverage my outstanding skill in interacting with business users by conducting meetings with the clients during the requirements analysis phase, and working in large - scale databases, like Map Reduce, Pig, Sqoop, Oozier, Hive and Microsoft Excel, an ability to identify new business opportunities as well as critical & analytical thinking skills, possess an unwavering commitment to the growth of an organization.
- Over 5+ years of IT experience in design, implementation, troubleshooting and maintenance of complex Enterprise Infrastructure.
- 4+ years of hands-on experience in installing, patching, upgrading and configuring Linux based operating system - RHEL and CentOS in a large set of clusters.
- 4+ years of experience in configuring, installing, benchmarking and managing Apache Hortonworks and Cloudera distribution of Hadoop
- 3+ years of extensive hands-on experience in IP network design, network integration, deployment and troubleshooting.
- Expertise on using Amazon AWS API tools like: Linux Command line, Puppet integrated AWS API tools
- Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
- Experience in installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
- Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
- Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
- Experience in managing Hadoop cluster using Cloudera Manager and Apache Ambari.
- Experience in using Zookeeper for coordinating the distributed applications.
- Experience in developing PIG and HIVE scripting for data processing on HDFS.
- Experience in scheduling jobs using OOZIE workflow.
- Experience in configuring, installing, managing and administrating HBase clusters.
- Experience in managing Hadoop resource using Static and Dynamic Resource Pools.
- Experience in setting up tools like Ganglia and Nagios for monitoring Hadoop cluster.
- Importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Experience in installing minor patches and upgrading Hadoop Cluster to major version.
- Experience in designing, installing and configuring VMware ESXi, within vSphere 5 environment with Virtual Center management, Consolidated Backup, DRS, HA, vMotion and VMware Data.
- Experience in designing and building disaster recovery plan for Hadoop Cluster to provide business continuity.
- Experience in configuring Site-to-site and remote access VPN using ASA firewalls.
- Extensive Experience of Operating Systems including Windows, Ubuntu, Red Hat, Cent OS and Mac OS.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of professional.
Hadoop Ecosystem: Hadoop 2.2, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Yarn, Spark, Kafka, Storm, Flume and mahout.
Hadoop Management & Security : Hortonworks, Cloudera Manager, Ubuntu.
Server Side Scripting: Shell, Perl, Python
Database: Oracle 10g, MySQL, DB2, SQL, RDBMS.
Programming Languages: Java, SQL, PL/SQL
Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBM WebSphere 6.0/ 5.1.1
NO SQL Databases: HBase, Mongo DB
OS/Platforms: Mac OS X 10.9.5, Windows, Linux, Unix, Cent OS
Virtualization: VMware, ESXI, VSphere, VCenter Server.
SDLC Methodology: Agile (SCRUM), Waterfall.
Confidential, Atlanta, GA
- Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
- Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Dumped the data from HDFS to MYSQL database and vice - versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Implemented NFS, NAS and HTTP servers on Linux servers.
- Created a local YUM repository for installing and updating packages.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Designed the shell script for backing up of important metadata.
- HA implementation of Name Node to avoid single point of failure.
- Implemented Name node backup using NFS. This was done for High availability.
- Supported Data Analysts in running Map Reduce Programs.
- Worked on analyzing data with Hive and Pig.
- Running cron-tab to back up data.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Configured Oozie for workflow automation and coordination.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Maintained, audited and built new clusters for testing purposes using the AMBARI, HORTONWORKS
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Designed and allocated HDFS quotas for multiple groups.
- Configured IPTABLES rules to allow the connection of application servers to the cluster and also setup NFS exports list and blocked unwanted ports.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
- Responsible to manage data coming from different sources.
Environment : Map Reduce, HDFS, Hive, Pig, Flume, Sqoop, UNIX Shell Scripting, Nagios, Kerberos.
- Installation, configuration, deployment, maintenance, monitoring and troubleshooting Ambari clusters in environments such as Development and Production.
- Commissioning and Decommissioning of Nodes from time to time in cluster.
- Solo responsible for everything related to clusters starting from maintaining, monitoring and keeping up the cluster all the time by supporting 24/7 to support business without any outages
- Worked for Name node recovery and Balancing Hadoop Cluster
- Upgraded Cloudera Hadoop from CDH4.3 to CDH5.3 and applying patches
- Worked on Data capacity planning, nodes forecasting and determining the correct hardware and infrastructure for cluster.
- Responsible for managing and scheduling jobs on a Hadoop Cluster.
- Configuration of Oozie Workflow Scheduler and Testing sample jobs.
- Monitoring all daemons, cluster health status on daily basis and tuning system performance related configuration parameters, backing up configuration xml files.
- Implemented Fair Scheduler to share the resources of the cluster for the Map Reduce jobs run by the users.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig, Sqoop and Optimizing performance of Jobs.
- Import data using Sqoop to load data from Oracle Server/MySQL Server to HDFS on regular basis.
- Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Assisted with Automatic Deployment using CHEF.
- Implemented security for Hadoop Cluster with Kerberos.
- Expereinced in LDAP integration with Hadoop and access provisioning for secured cluster.
- Work with Hadoop administrators, designers in troubleshooting Map Reduce job failures and issues.
- Work with network and system engineers to define optimum network configurations, server hardware and operating system.
- Provided On Call Support to fix the various Production issues on the fly to provide smooth running of jobs in peak time.
Environment: CDH5.3 and CDH4.3, Apache Hadoop 2.x, HDFS, YARN, Map Reduce, Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Fair Scheduler, LDAP, Kerberos, Oracle Server, MySQL Server, ElasticSearch, CHEF Automation, Core Java, Linux, Bash scripts
Confidential, Lake Forest, IL
- Install hadoop2/ Yarn, spark, scala IDE, JAVA JRE on three machines. Configure these machines as a cluster, and set one Name node and two Data nodes
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Implemented AWS solutions using EC2, S3 and load balancers.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data.
- Setup and monitored Ambari cluster cluster with Hadoop2/ YARN running to read data from the Cluster
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Ganglia to Monitor and Nagios to send alerts about the cluster around the clock.
- Involved in creating Hadoop streaming jobs using Python.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Worked on various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Developed multiple MapReduce jobs in java for data cleaning.
- Developed Hive UDF to parse the staged raw data to get the Hit Times of the claims from a specific branch for a particular insurance type code.
- Expert in implementing advanced procedures like text analytics and processing using the in - memory computing capabilities like Apache Spark written in Scala
- Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
- Built wrapper shell scripts to hold Oozie workflow.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on MapReduce Joins in querying multiple semi-structured data as per analytic needs.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed POC for Apache Kafka.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Familiarity with NoSQL databases including HBase, MongoDB.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Hadoop 2, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, ZooKeeper, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux, Kafka, Amazon web services.