- Over 9 years of professional Information Technology experience in Hadoopand Java Administration activities such as installation, configuration and maintenance of systems/clusters.
- Hands on experience on HadoopClusters using Hortonworks (HDP), Cloudera (CDH5), and Yarn distributions platforms.
- Possessing skills in Apache Hadoop, Map - Reduce, Pig, Impala, Hive, Platfora, Hbase, Zookeeper, Sqoop, Flume, OOZIE, Kafka.
- Experience in deploying and managing the multi-node development and production Hadoopcluster with different Hadoop components (Hive, Pig, Sqoop, Oozie, Flume, Catalog, Hbase, Zookeeper) using Hortonworks Ambari.
- Good experience in creating various database objects like tables, stored procedures, functions, and triggers using SQL, PL/SQL, and DB2.
- Experience in Configuring Name-node High availability and Name-node Federation and depth knowledge on Zookeeper for cluster coordination services.
- Experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Extensive knowledge in Tableau on enterprise environment and Tableau administration experience including technical support, troubleshooting, reporting and monitoring of system usage.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Worked on NoSQL databases including Hbase, Cassandra.
- Designing and implementing security for Hadoop cluster with Kerberos secure authentication.
- Hands on experience on Nagios and Ganglia tool for cluster monitoring system.
- Experience in scheduling all Hadoop/Hive/Sqoop/Hbase jobs using Oozie.
- Knowledge of Data Ware Housing concepts and Cognos 8 BI Suit and Business Objects.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Working knowledge in installing and maintaining Cassandra by configuring the Cassandra yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity
- Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in Name node failure scenarios.
- Team Player with good communication and interpersonal skills and also goal oriented approach to problem solving issues.
Big Data Technologies: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Hbase, Flume, Oozie, Spark, Zookeeper.
Hadoop Platforms: Hortonworks and Cloudera, Apache Hadoop
Certifications: Hortonworks HDP certified administrator.
Networking Concepts: OSI Model, TCP/IP, UDP, IPV4, Subnetting, DHCP & DNS
Database/ETL: Oracle, Cassandra, DB2, MS-SQL Server, MySQL, MS-Access, Hbase, MongoDB, Informatica, Teradata.Linux (CentOS, Ubuntu, Red Hat), Windows, UNIX and Mac OS-X
Operating Systems: Linux (CentOS, Ubuntu, Red Hat), Windows, UNIX and Mac OS-X
XML Languages: XML, DTD, XML Schema, XPath.
Monitoring and Alerting: Nagios, Ganglia, Cloudera Manager, Ambari.
Confidential, St. Louis, MO
Sr. Hadoop Administrator
- As an Hadoop admin worked in Huge Cluster on maintaining nodes with High availability environment using Hortonworks Ambari manager and Cloudera Manager.
- Involved in Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and Transform data from RDBMS to HDFS and followed proper backup & Recovery strategies.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like frequency of calls, top calling customers and designed and implemented service layer over Hbase Database.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Provide Business Intelligence support using Tableau for implementing effective Business dashboards & visualizations of data.
- Configuring, implementing and supporting High Availability (Replication) with Load balancing (Sharding) cluster of MongoDB having Terabytes of data.
- Hadoop cluster monitoring and troubleshooting Hive, Datameer, Platfora and flume.
- Experience with securing Hadoop clusters including Kerberos KDC installation, Open LDAP installation, data transport encryption with TLS.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Used Cassandra in multiple virtual and physical data centers to ensure the system was highly redundant and scalable.
- Exported the analyzed data from MySQL to the HDFS using Sqoop for visualization and to generate reports for the BI team.
- Importing of data from various data sources such as Oracle and Comptel server into HDFS using transformations such as Sqoop, Map Reduce.
- Designed and developed scalable and custom Hadoop solutions as per dynamic data needs and coordinated with technical team for production deployment of software applications for maintenance.
- Involved in loading data from UNIX file system to HDFS
- Real time streaming data using Spark with Kafka.
- Worked with ETL team to load data into Data Warehouse/Data Marts using Informatica.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia. Reviewing the log files and error solving.
- Involved in importing the real time data to Hadoop using Kafka. Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Developed custom Process chains to support master data and transaction data loads from BI to BPC.
- Involved in various POC activity using technology like Map reduce, Hive, Pig, and Oozie.
Environment: Hadoop, HDFS, Hive, Sqoop, Flume, Hortonworks, Cassandra, Java, Impala, Talend, Tableau, Kafka, storm, Zookeeper and Hbase, Kafka, YARN, Oracle 9i/10g/11 RAC with Solaris/RedHat, MongoDB, Kerberos, SQL plus, PHP, Shell Scripting, ETL/BI architectures and SQL, RedHat/Suse Linux, EM Cloud Control.
Confidential, Warren, NJ
- Involved in design and planning phases of Hadoop Cluster planning.
- Responsible for Regular health checkups of the Hadoop cluster using custom scripts.
- Installed and configured multi-node fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, and Hardware optimizations.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Monthly Linux server maintenance, shutting down essential Hadoop name node and data node.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Balancing Hadoop cluster using balancer utilities to spread data across the cluster equally.
- Implemented data ingestion techniques like Pig and Hive on production environment.
- Routine cluster maintenance on every weekend to make required configuration changes, installation etc.
- Expertise in Linux Enterprise using HP ProLiant Servers and Virtual Connect Technology.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Designing and creating ETL jobs through Talend to load huge volumes of data into cassandra, Hadoop Ecosystems and relational databases.
- Worked extensively with sqoop for importing metadata from Oracle. Used Sqoop to import data from SQL server to Cassandra.
- Implement Flume, Spark, Spark Stream framework for real time data processing. Developed analytical components using Scala, Spark and Spark Stream. Implemented Proofs of Concept on Hadoop and Spark stack and different big data analytic tools, using Spark SQL as an alternative to Impala.
- Implemented Apache Spark data processing project to handle data from RDBMS and streaming sources.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Kerberos keytabs creation for ETL application use cases before on boarding to Hadoop.
- Responsible for adding User to Hadoop cluster.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Worked on Installing 20 node UAT Hadoop cluster.
Environment: Cloudera, Java, RedHat Linux, HDFS, Mahout, Map-Reduce, Cassandra, Hive, Pig, Sqoop, Spark, Scala, Flume, Zookeeper, Oozie, DB2, HBase and Pentaho.