- 8+ Years of experience in IT industry including 5 years of experience in Hadoop Administration and Development using Apache, Cloudera (CDH), Hortonworks (HDP) Distributions.
- Experience in installation, configuration, supporting and monitoring 100+ node Hadoop clusters from major distributions like CDH 4, CDH 5 using Cloudera Manager and Apache Ambari.
- In - depth understanding of Hadoop Frameworks (version 1 and 2) Yarn/MapReduce/HDFS and its components like Job tracker, Task tracker, Name Node, Data Node, Resource Manager, Node Manager, & App Master.
- Experience in installation and configuration of various Hadoop ecosystem components like Hive, Pig, Spark, Sqoop, Flume, Kafka, Oozie, Zookeeper, HBase, MongoDB, Cassandra, Impala, and R.
- Expertise on designing and implementing complete end to end Hadoop Infrastructure.
- Experience in cluster Capacity planning, Optimization of Cluster to meet the SLA.
- Well versed in managing & reviewing of log files of Hadoop and ecosystem services to determine the root cause.
- Metadata backup configuration and Disaster Recovery of Namenode using backed up editlogs and fsimage.
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Experience configuring Rack Awareness in the Hadoop cluster.
- Experience in Importing and exporting data using SQOOP from RDBMS to Hadoop and troubleshooting issues related to SQOOP jobs.
- Experience in using Flume to stream data into HDFS - from various sources.
- Experience using DistCp command line utility to copy Files between clusters.
- Cluster coordination services through Zoo Keeper.
- Setting up Kerberos authentication for Hadoop.
- Hands on Experience using Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
- Experience configuring Capacity Scheduler, Fair Scheduler, and HOD Scheduler for Job and user management.
- Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
- Experience in performing minor and major upgrades, HDFS Balancing, Commissioning and Decommissioning the Data nodes on Hadoop cluster.
- Experience in using tools puppet, chef, and writing/modifying Shell Scripts for configuration process automation and cluster monitoring.
- Good knowledge of setting up Hadoop cluster in AWS EC2 & S3 and also the automation of setting up & extending the clusters in AWS Amazon cloud.
- Hands on experience in writing Ad-hoc queries for moving data from HDFS to Hive and analyzing data using Hive QL.
- Very good knowledge on ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
- Extensive experience in Linux admin activities on RHEL & Cent OS distributions.
- Very good knowledge of Data warehouse tools.
- Good knowledge and experience in Core Java, JSP, Servlets, Multi-Threading, JDBC, HTML.
- Good understanding of Software Development Lifecycle (SDLC), Waterfall and Agile methodologies.
- Effective problem solving and interpersonal skills. Ability to learn and use new technologies quickly.
- Self-starter with ability to work independently as well as within a team environment.
Big Data components: HDFS, MapReduce, YARN, HBase, Cassandra, MongoD, Pig, Hive, Spark, Impala, Kafka, Sqoop, Flume, Oozie, Zookeeper, & Kettle
Programming Languages: HiveQL, Pig Latin, Shell scripting, Java, J2EE, SQL, C/C++, & PL/SQL
UNIX Tools: Apache, Yum, RPM.
Operating Systems: Red Hat Linux, Cent OS, Ubuntu, Windows, Mac OS
Protocols: TCP/IP, HTTP and HTTPS
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, Apache Ambari
Methodologies: Agile, V-model, Waterfall model
Databases: HBase, MongoDB, Cassandra, Oracle 10g, MySQL, MS SQL server
Encryption Tools: VeraCrypt, AxCrypt, BitLocker, GNU Privacy Guard
Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, Pentaho
Confidential - Chicago, IL
- Installed, configured and administered Hadoop cluster CDH 5.2.3 and its components.
- Deployed hardware and software for Hadoop to expand memory and storage on nodes according to requirement.
- Performed data exchange operations using Sqoop and Flume between HDFS and different Web Applications and databases.
- Monitored Data streaming between web sources and HDFS.
- Configured YARN and optimized memory related settings.
- Collaborated with infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Perform architecture design, data modeling, and implementation of SQL, Big Data platform and analytic applications for the consumer products.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Fine-tuned Hive jobs for better performance.
- Performed rolling upgrades of Hadoop cluster.
- Installed operating system and Hadoop updates, patches, version upgrades when required.
- Screened job performances in Hadoop cluster and capacity planning.
- Managed configuration changes based on volume of the data being processed.
- Monitored connectivity and security of Hadoop cluster.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Imported and exported data between RDBMS and HDFS using Sqoop.
- Performed data migration to Hadoop from existing data stores.
- Set up new Linux users and tested HDFS, Hive, Pig and Map Reduce access for them.
- Performed Linux systems administration on production and development servers (RHEL, CentOS and other UNIX utilities).
- Commissioned and decommissioned data nodes in the Cluster.
- Configured a Hadoop cluster with 20-30 nodes (Amazon EC2 spot Instance) to transfer the data from Amazon S3 to HDFS and HDFS to Amazon S3.
- Job and user management using Capacity Scheduler.
- Installed Patches and packages on Unix/Linux Servers.
- Installed and configured vSphere client, created Virtual Server and allocated resources.
- Used various Utilities to do Performance Tuning, Client/Server Connectivity and Database Consistency Checks.
- Analyzed running statistics of Map and Reduce tasks and provided inputs to development team for efficient utilization memory and CPU of the cluster.
Environment: CDH 5.2.3, Cloudera Manager, Redhat Linux/Centos 4, 5, 6, AWS EC2, Logical Volume Manager, HDFS, Hive, Pig, Sqoop, Flume, ESX 5.1/5.5, Apache and Tomcat Web Server, Oracle 11,12, Oracle Rac 12c, HPSM, HPSA, Kerberos security.
Confidential - Cleveland, OH
- Worked on multiple projects on Architecting Hadoop Clusters.
- Installed, Configured and Managed of Hadoop Cluster using Cloudera Manager and Puppet.
- Upgraded Hadoop CDH 4.2 to CDH 4.6 in development environment.
- Performed metadata backups and upgrades on Hadoop Development cluster.
- Set up and configured Zookeeper for cluster coordination services.
- Managed cluster configuration to meet the needs of analysis- I/O bound and CPU bound.
- Managed and reviewed Hadoop Log files for troubleshooting issues.
- Performed bench mark test on Hadoop clusters and tweaked the solution based on test results.
- Commissioned and Decommissioned the Data nodes in Hadoop Cluster.
- Performed data validation using HIVE dynamic partitioning.
- Transformed large sets of structured and semi structured data by applying ETL processes using Hive.
- Developed Map Reduce programs for data analysis.
- Worked on troubleshooting, monitoring, tuning the performance of Map reduce Jobs.
- Developed Pig scripts for transformation of raw data into intelligent data.
- Supported data analysts in running Pig scripts and Hive queries.
- Scheduled Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
- Configured Fair scheduler on the Resource Manager to manage cluster resource for Jobs & users.
- Migrated data across clusters using distcp.
- Collaborated with DevOps team to meet the business requirements of customers and proposed Hadoop solutions.
- Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion.
- Supported data analysis projects through Elastic Map Reduce on the Amazon Web Services (AWS) and Rack space cloud. Performed Export and import of data into S3.
- Preparing documentation on the cluster configuration for future reference.
Environment: Cloudera Hadoop, Linux, HDFS, Hive, Pig, Sqoop, Flume, Zookeeper, HBase, YARN, RDBMS, Oozie, AWS.
Confidential - Grand Rapids, MI
- Managed, Administered and Monitored clusters in Hadoop Infrastructure.
- Diligently teamed with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborated with application teams to install Hadoop updates, patches, when required.
- Managed connectivity of nodes and security on Hadoop cluster.
- Commissioned and decommissioned Data nodes from the cluster.
- Implemented Name Node High Availability.
- Worked with data delivery teams to setup new Hadoop users.
- Installed and configured Hadoop eco system components like Hive, Pig, Flume, Sqoop, and HBase.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Configured Metastore for Hadoop ecosystem and management tools.
- Hands-on experience with Nagios and Ganglia monitoring tools.
- Experience in HDFS data storage and support for running Map Reduce jobs.
- Performed tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
- Loaded data into HDFS from dynamically generated files using Flume and from RDBMS using Sqoop.
- Used Scoop to export the analyzed data from HDFS to RDBMS for business use cases.
- Skillfully used distcp to migrate data between and across the clusters.
- Installed and configured Zookeeper to co-ordinate Hadoop daemons.
- Coordinated root cause analysis efforts to minimize future system issues.
Environment: Cloudera 4.2, HDFS, Hive, Pig, Sqoop, HBase, Chef, RHEL, Mahout, Tableau, MySQL, Shell Scripting.
- Installation, configuration and administration of Red Hat Linux servers and support for Servers and regular upgrades of Red Hat Linux Servers using kick start based network installation.
- Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x, 5.x servers and resolved trouble tickets on shift rotation basis.
- Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and IBM p-series machines, for production, staging and test environments.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, & 5.7.
- Performance monitoring utilities like IOSTAT, VMSTAT, TOP, NETSTAT and SAR.
- Worked on Support for Aix matrix sub system device drivers.
- Worked on with the computing by both physical and virtual from the desktop to the data center using the SUSE Linux. Expertise in Build, Install, load and configure boxes.
- Experienced in Installation, Configuration, and Troubleshooting of Tivoli Storage Manager.
- Remediated failed backups, took manual incremental backups of failing servers.
- Upgraded TSM from 5.1.x to 5.3.x. Worked on HMC Configuration and management of HMC Console which included up gradation, micro partitioning.
- Installed and configured adapter card’s cables. Worked on Integrated Virtual Ethernet and building up of VIO servers.
- Installed SSH Keys for Successful login of SRM data into the server without prompting password for daily backup of vital data such as processor utilization, disk utilization, etc.
- Provided redundancy with HBA card, Ether channel configuration and network devices.
- Coordinated with application and database teams for troubleshooting the application.
- Coordinated with SAN team for allocation of LUN's to increase file system space.
- Configured and administered Fiber Card Adapter and handled AIX part of SAN.
Environment: Red Hat Linux (RHEL 3/4/5), Solaris 10, Logical Volume Manager, Sun & Veritas Cluster Server, VMWare, Global File System, Red hat Cluster Servers.
- Administered RHEL 4.x and 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Created & cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrated servers between ESX hosts and Xen servers.
- Installed Red Hat Linux using kick-start and applying security polices for hardening the server based on the company policies.
- Installed and verified that all AIX/Linux patches or updates are applied to the servers.
- Installed RPM and YUM packages patch and other server management.
- Managed systems routine backup, scheduling jobs like disabling and enabling Cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning and testing.
- Worked and performed data-center operations including rack mounting and cabling.
- Installed, configured, and maintained Weblogic10.x and Oracle10g on Solaris & Red Hat Linux.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, user and group quota.
- Configured multipath, adding SAN and creating physical volumes, volume groups, & logical volumes.
Environment: RHEL, VMware 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Shell Scripting.