- 8+ years of IT experience including 4+ years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco - system components in the existing cluster.
- Experienced in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components ( HIVE, PIG, Spark, SQOOP, OOZIE, FLUME, HBASE, ZOOKEEPER ) using Apache Ambari.
- Experienced on Horton works and Cloudera manager Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experienced in implementing new cluster all together from scratch and done live data migration from the old cluster to the newly built one without affecting any running production jobs.
- Implemented Cloudera Backup and Disaster Recovery (BDR) for enabling data protection across datacenters for Disaster Recovery scenarios
- Excellent understanding of Hadoop Cluster security and implemented secure Hadoop cluster using Kerberos & TLS.
- Linux KVM installation and management VM's with RHEL 7.2
- Experienced in improving the Hadoop cluster performance by considering the OS kernel, Storage, Networking, Hadoop HDFS and Map-Reduce by setting appropriate configuration parameters.
- Experienced in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Ambari.
- Experienced in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
- Experienced in Customizing to use Active Directory, LDAP for authentication.
- Experienced in using Zookeeper for coordinating the distributed applications.
- Experienced in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
- Experienced in Chef, Puppet or related tools for configuration management.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
- Optimizing performance of Hbase/Hive/Pig jobs .
- Experienced in managing the cluster resources by implementing fair and capacity scheduler.
- Experienced in scheduling jobs using OOZIE workflow .
- Scheduling jobs using crontab.
- Experienced in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Strong knowledge in configuring Name Node High Availability.
- Experienced in configuring Hadoop Security (Ranger and Knox gateway).
- Experienced in handling multiple relational databases: MySQL, SQL Server.
- Assisted Developers with problem resolution.
- Ability to play a key role in the team and communicates across the team.
- Global Service Delivery experience by bringing together resources to accomplish organizational goals using ITIL framework.
- Effective problem solving skills and outstanding interpersonal skills .
- Ability to work independently as well as within a team environment.
- Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Ambari
- Experienced in Linux Administration and TSM Administration.
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.5
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH.
Programming Languages: C, Java, SQL, and PL/SQL.
Front End Technologies: HTML, XHTML, XML.
Application Servers: Apache Tomcat, WebLogic Server, Web sphere
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.
NoSQL Databases: HBase, Cassandra, MongoDB
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.
Security: Kerberos, Knox, Ranger.
Confidential, North Brunswick, NJ
Hadoop Security Administrator
- Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Install and Manage HDP Hortonworks DL and DW components.
- Worked on Hadoop Hortonworks (HDP 2.2.0) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, Pig, Hbase, Cloudera, Sqoop, Flume, Spark, Ambari Metrics, ZooKeeper, Falcon and Oozie etc.) for 4 cluster ranges from LAB, DEV, QA to PROD contains nearly 350+ nodes with 7PB data.
- Monitor Hadoop cluster connectivity and security on Ambari monitoring system.
- Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Rendered L3/L4 support services for BI users, Developers and Tableau team through Jira ticketing system.
- One of the key engineers in Aetna's HDP web engineering team, Integrated Systems engineering ISE.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability. Worked on installing Kafka on Virtual Machine.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MaprFS.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
- Configured and installed several Hadoop clusters in both physical machines as well as the AWS cloud for POCs.
- Hands on experience in coding MapReduce/Yarn Programs using Java, and Python for analyzing Big Data.
- Worked closely with System Administrators, BI analysts, developers, and key business leaders to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
- Worked with NoSQL databases like Hbase and Mongo DB for POC purpose.
- Configured, installed, monitored MapR Hadoop on 10 AWS EC2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster.
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x
- Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
- Hortonworks Ambari, Apache Hadoop on Redhat, and Centos as data storage, retrieval, and processing systems.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID's using keytab scripts.
- Performed a Major upgrade in production environment from HDP 2.3 to HDP 2.6. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed OLAP software Atscale on its designated edge node server.
- Implemented dual data center set up for all Cassandra cluster. Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
- Conducted cluster sizing, tuning, and performance benchmarking on a multi-tenant OpenStack platform to achieve desired performance metrics.
- Good knowledge on providing solution to the users who encountered java exception and error problems while running the data models in SAS script and Rscript. Good understanding on forest data models.
- Worked on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as HBase, Cassandra.
- Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developer’s/business users for day-to-day activities.
- Create queues and allocated the clusters resources to provide the priority for jobs in hive.
- Implementing the SFTP for the projects to transfer data SCP from External servers to servers. Experienced in managing and reviewing log files. Involved in scheduling Oozie workflow engine to run multiple Hive, Sqoop and pig jobs.
Environment: CDH 5.4.3 and 4.x, Cloudera Manager CM 5.1.1, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Apache, Impala, Kafka, Aws Flume, Zookeeper, Chef, Redhat/Centos 6.5, Control-M.Confidential, Oak Brook, IL
Big Data Administrator
- Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Addressing and Troubleshooting issues on a daily basis.
- File system management and monitoring.
- Provided Hadoop, OS, Hardware optimizations.
- Installed and configured Hadoop ecosystem components like Map Reduce, Hive, Pig, Sqoop, HBase, Zookeeper and Oozie.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Involved in testing HDFS, Hive, Pig and Map Reduce access for the new users.
- Cluster maintenance as well as creation and removal of nodes using Apache Ambari
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures Implemented capacity scheduler to allocate fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
- Configured Zookeeper to implement node coordination, in clustering support.
- Rebalancing the Hadoop Cluster.
- Allocating the name and space Quotas to the users in case of space problems.
- Installed and configured Hadoop security tools Knox, Ranger and enabled Kerberos
- Managing cluster performance issues.
- Creating snapshots and restoring snapshots.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using Distcp.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Maintaining Cluster in order to remain healthy and in optimal working condition.
- Handle the upgrades and Patch updates.
Environment: Hortonworks, Ambari, HDFS, Java, Shell Scripting, Python, Hive, Spark, Sqoop, Linux, SQL, Cloudera, Zookeeper, HBase, Oozie, Kerberos, RangerConfidential, Piscataway, NJ
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
- Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Monitored multiple clusters environments using Metrics and Nagios.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Worked on analyzing Data with HIVE and PIG.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
Environment: - HDFS, Map Reduce, Hive, Sqoop, PIG, Cloudera, Flume, SQL Server, UNIX, RedHat and CentOS.Confidential, Sunny vale, CA
Linux Hadoop Administrator
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity and configured Hadoop, MapReduce, HDFS, developed multiple MapReduce jobs in JAVA for data cleaning.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Created Apache Directory Server for local network and Integrating RHEL 6.x instance with Active Directory in AWS VPC
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing planning, and slots configuration.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hortonworks, Ambari, HDFS, Java, Shell Scripting, Python, Hive, Spark, Sqoop, Linux, SQL, Cloudera, Zookeeper, AWS, HBase, Oozie, Kerberos, RangerConfidential
Linux System Administrator
- Installation, Configuration & Migration of UNIX and Linux operating systems
- Manage, maintain and fine tune clustered Apache Tomcat server configuration.
- Installation Packages and patches.
- Installed and configured Ubuntu troubleshooting hardware, operating system, applications & network problems and performance issues
- Worked closely with the development and operations organizations to implement the necessary tools and process to support the automation of builds, deployments, testing of infrastructure.
- Installed and configured various flavors of Linux like Red hat, SUSE and Ubuntu.
- Monitored trouble ticket queue to attend user and system calls, participated in team meetings, change control meetings to update installation progress, and for upcoming changes.
- Diagnosing and resolving systems related tasks in accordance with priorities setup in dealing with trouble tickets.
- Deployed patches for Linux and application servers, Red Hat Linux Kernel Tuning.
- Network trouble shooting using 'netstat', 'ifconfig', 'tcpdump', 'vmstat', 'iostat'.
- Managed cron jobs, batch processing and job scheduling.
- Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users.
- Troubleshooting application issues on Apache web servers and database servers running on Linux and solaris.
- Performed the manual backups of Database, software and OS using tar, cpio, mksysb.
- Manage file system utilization using script scheduled as a cron job
- Performed automation with simple shell scripting
- Monitoring backup using Backup Exac, regularly monitored Alert-log Files and trace files on the day-to-day basis.
- Monitoring system performance, Server load and bandwidth issues.
- Regularly manage backup process for Server and Client data.
- Environment: LINUX/UNIX, ORACLE DB2, SQL SERVER.
Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access, NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.