Over all 6+ years of working experience, including with 5+ years of experience as a Hadoop Administration and along with around 1+ Year of experience in Linux admin related roles. Strong working experience with Big Data and Hadoop Ecosystems including HDFS, PIG, HIVE, HBase, Yarn, Sqoop, Flume, Oozie, Hue, Map Reduce and Spark. Deep cover in minor and major upgrades of Hadoop and Hadoop eco system.
AREAS OF EXPERTISE INCLUDE:
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Kafka, Cassandra, Oozie, Flume, Chukwa, Pentaho Kettle and Talend
Cloud: AWS (EC2, S3, ELB, EBS, VPC, Auto Scaling), Azure
Programming Languages: Java, C/C++, eVB, Assembly Language (8085/8086)
Databases: NoSQL, Oracle
UNIX Tools: Apache, Yum, RPM
Tools: Eclipse, JDeveloper, JProbe, CVS, Ant, MS Visual Studio
Platforms: Windows (2000/XP), Linux, Solaris, AIX, HPUX
Application Servers: Apache Tomcat 5.x 6.0, Jboss 4.0
Testing Tools: Net Beans, Eclipse, WSAD, RAD
Methodologies: Agile, UML, Design Patterns
Confidential, Austin, TX
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof - of-Concept) to PROD clusters.
- Installation, Configuration, up gradation and administration of Windows, Sun Solaris, RedHat Linux and Solaris.
- Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Horton works and Cloudera bundles.
- Worked as admin on Cloudera (CDH 5.5.2) distribution for clusters ranges from POC to PROD.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Set up Hortonworks Infrastructure from configuring clusters to Node security using Kerberos.
- Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena, and Snowflake.
- Created and maintained various Shell and Python scripts for automating various processes.
- Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
- Installing MySQLDB in Linux and Customize the MySQL DB parameters.
- Installed Kafka cluster with separate nodes for brokers.
- Installing and configuring Kafka and monitoring the cluster using Nagios and Ganglia.
- Responsible for installation, configuration and management of Linux servers and POC Clusters in the VMware environment.
- Configuring Apache and supporting them on Linux production servers.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Experience in setting up Kafka cluster for publishing topics and familiar with lambda architecture.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Level 2, 3 SME for current Big Data Clusters at the Client Site and set up standard troubleshooting technique.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Loaded log data into HDFS using Flume, Kafka and performing ETL integrations.
- Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: HDFS, Docker, Puppet, Map Reduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, CDH5, Apache Hadoop 2.6, Spark, AWS, SOLR, Storm, Knox, Cloudera Manager, Red Hat, MySQL and Oracle.
Confidential, Cedar Rapids, IO
- Installed, configured, monitored, and maintained HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, and Hive.
- Worked on Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
- Supported Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Planned, scheduled and Implemented OS patches on Centos / RHEL boxes as a part of proactive maintenance.
- Developed scripts in shell and python to automate lot of day to day admin activities.
- Day-to-day operational support of our Hortonworks Hadoop clusters in production, at multi-petabyte scale.
- Managing user access to AWS resources using Identity Access Management (IAM).
- Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket to store the archive data and Hadoop log files.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Expert in utilizing Kafka for messaging and publishing subscribe messaging system.
- Imported the data from different sources like AWS S3, Local file system into Spark RDD.
- Defined job flows and managed and reviewed Hadoop and Hbase log files.
- Supported Map Reduce Programs those are running on the cluster.
- Migrated applications from internal data center to AWS.
- Installed and configured Hive and also written Hive QL scripts.
- Created Hive tables, loaded with data and wrote hive queries which will run internally in Map Reduce way.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Managed several Hadoop clusters in production, development, Disaster Recovery environments.
- Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories.
- Trouble shot many cloud related issues such as Data Node down, Network failure and data block missing.
- Managed cluster coordination services through Zoo Keeper.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Resolved tickets submitted by users, troubleshot the error documenting and resolved the errors.
Environment: Hadoop HDFS, AWS, Map Reduce, Hive, Pig, Puppet, Zookeeper, HBase, Flume, Ganglia, Sqoop, Linux, CentOS, Ambari.
- Involved in Cluster Monitoring backup, restore and troubleshooting activities.
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Responsible for implementation and ongoing administration of Hadoop infrastructure
- Managed 100 + UNIX servers running RHEL, HPUX on Oracle HP.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Worked on Configuring Kerberos Authentication in the cluster
- Very good experience with all the Hadoop eco systems in UNIX environment.
- Experience with UNIX administration.
- Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.
- Worked on indexing the Hbase tables using Solr and indexing the Json data and Nested data.
- Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x servers and resolved trouble tickets on shift rotation basis.
- Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and IBM p-series machines, for production, staging and test environments.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, 5.7 and 6.
Environment: IBM AIX, Red Hat Linux, IBM Netezza, SUN Solaris, SUSE Linux, P-Series servers, VMware, vSphere, VMs, OpsView, Nagios, TAD4D, ITM, INFO, TEM, Pure Scale Pure Flex, PureApp.