- Professional with 7+ years of IT experience in industries including Banking, Healthcare and Education.
- Cloudera Certified Hadoop administrator with 5+ years of experience in activities such as installation and configuration of clusters using Apache Hadoop, Cloudera (CDH), Hortonworks (HDP), AWS Elastic Map Reduce (EMR).
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
- Deployed Hadoop cluster on on-premises Data Centers and Private Cloud Environments.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
- Hands on experience in installing, configuring, and using Hadoop ecosystem components likeHDFS, Yarn, MapReduce, Spark, HBase, Oozie, Hive, Sqoop, Flume, Storm, Kafka and Sentry.
- Installed and configured setting up of automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
- Experienced on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Experienced in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
- Experienced in creating service principles, user principles and establishing cross-realm trust.
- Strong knowledge on enforcing RBACto hive data and metadata on a Hadoop cluster using Sentry.
- Experienced in setting up the High-Availability Hadoop Clusters and BDR clusters.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Good experience on configuring and managing the backup and disaster recovery for Hadoop data.
- Hands on experience in managing, reviewing and analyzing Log files for Hadoop and eco system services and finding root cause by performing root cause analysis.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Experienced in HDFS data storage and support for running map-reduce jobs and spark jobs.
- Experienced in importing and exporting the data using Sqoop from HDFS to RDBMS and Scheduling all Hadoop/Hive/Sqoop/HBase jobs using Oozie.
- Experienced on Rack awareness configuration for quick availability and processing of data.
- Involved in balancing the loads on server,tuning of server for optimal performance of the cluster.
- Handsome experience in Linux admin activitiessuch as user management, scheduling cron jobs, setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, Swappiness, Selinux and installing Java.
- Experienced in configuration management tools like puppet, Ansible and chef.
- Good understanding in Deployment of Hadoop Clusters Using Automated Puppet scripts
- Experienced in hardware recommendations, performance tuning and benchmarking
- Experienced in IP Management (IP Addressing, Sub-netting, Ethernet Bonding, Static IP)
- Flexible with Unix/Linux and Windows Environments working with OS like Centos 5/6, Ubuntu 10/11 and RedHat 6/7.
- Experienced in Linux Storage Management. Configuring RAID Levels, Logical Volumes.
Scripting Language: Shell, bash
Hadoop Ecosystem: Hadoop 2.7.x, Spark 2.1.0, MapReduce, Hive 2.1.1, Sqoop 1.99.7, Flume 1.7.0, Kafka 2.1.0, Oozie 3.1.3, Yarn 0.21.3, Pig 0.14, Zookeeper 3.4.6
Database: MySQL 5.x, Oracle 11g, HBase 1.3.0, Cassandra 3.10, PL/SQL 11g, MS SQL Server.
Hadoop Distributions : Cloudera, Hortonworks, AWS EMR, Apache Hadoop
IDE Application: Eclipse 4.6, Net beans
Collaboration: Git 2.12.0, Scala Test 3.0.1
Operating Systems: Windows10, Mac OS, Ubuntu, Centos, Red hat
Data Analysis & Viz: Tableau
Cloud Environment: AWS EC2, S3, IAM, VPC, ROUTE53, Cloud watch, Cloudera Formation
Web Services: RESTful, SOAP
Confidential, San Ramon, CA
- Provided infrastructure support for multiple clusters like Production(Prod), Pre-Production(Pre-prod), Quality (QA) and Disaster Recovery(DR)
- Installed and configured Hadoop cluster across various environments through Cloudera Manager
- Installed and configured MYSQL and Enabled High Availability.
- Installed and configured Sentry server to enable schema level Security.
- Installed and configured Hadoop services HDFS, Yarn, MapReduce, Spark, HBase, Oozie, Hive, Sqoop, Flume, Kafka and Sentry.
- Configured Fair schedulers in cluster, created resource pools, and dynamic resource allocation of resources during regular monitoring of resource intensive jobs
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services
- Day to day responsibilities includes solving hadoop developer issues and providing instant solution to reduce the impact and documenting the same and preventing future issues.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experienced in upgrades, patching, Rolling Upgradesactivities without any data loss and with proper backup plans.
- Integrated external components like Tibco and Tableau with Hadoop using Hive server2.
- Implemented HDFS snapshot feature, migrated data across clusters using DISTCP.
- Performed both major and minor upgrades to the existing Cloudera Hadoop cluster.
- Integrated Hadoop with Active Directory and enabled Kerberos for Authentication.
- Build a new sandbox cluster for the testing purpose and move data from secure cluster to insecure sandbox cluster by usinga tool DISTCP (distributed copy).
- Installed Kafka cluster with separate nodes for brokers.
- Performed Kafka operations on regular basis.
- Expertise in Performance tuning and optimized Hadoop clusters to achieve high performance.
- Implemented schedulers on the Resource Manager to share the resources of the cluster.
- Monitoring Hadoop Clusters using Cloudera Manager and 24x7 on call support.
- Expertise in implementation and designing of disaster recovery plan for Hadoop Cluster.
- Extensive hands on experience in Hadoop file system commands for file handling operations.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Prepared System Design document with all functional implementations.
- Worked with SQOOP import and export functionalities to handle large data set transfer betweentraditional databases and HDFS.
Environment: Hadoop Hdfs, Mapreduce, Hive, Pig, Oozie, Sqoop, Cloudera Manager, Storm, AWS S3, Ec2, IAM, Zookeeper, spark
Confidential, Valley Forge, PA
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Conducting RCA to find out data issues and resolve production problems.
- Worked on Hive optimization techniques to improve the performance of long running jobs.
- Experienced in managing and reviewing Hadoop log files
- Worked with Sqoop in Importing and exporting data from different RDMSinto HDFS and Hive.
- Worked on setting up HA for major production cluster and designed automatic failovercontrol using zookeeper and quorum journal nodes.
- Experience on HBase High availability and manually tested using failover tests.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Experience in upgrading the cluster to newer versions of CDH 5.8.2 and CM 5.9.1
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Falcon, Smartsense, Storm, Kafka and Spark.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Continuous monitoring and managing the Hadoop cluster through Cloudera manager.
- Created user accounts and given users the access to the Hadoop cluster.
Environment: Hadoop Hdfs, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, spark, Cloudera Manager.
Confidential, Westbury, NY
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Site scope for monitoring and Alerting.
- Launching and Setup of HADOOP Cluster on physical servers, which includes configuring different components of HADOOP.
- Created a local YUM repository for installing and updating packages.
- Developed data pipelines that ingests data from multiple data sources and process them.
- Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or HBase tables.
- Implemented Kerberos authentication-KDC server setup, creating realm /domain, managing principles, generating key tab file for each service and managing key tab using key tab tools.
- Configured Knox for perimeter security and Ranger for granular access in the cluster.
- Configured and installed several Hadoop clusters in both on-premises and AWS cloud for POCs.
- Configured and deployed hive metastore using MySQL and thrift server.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Extensively used Sqoop to move the data from relational databases to HDFS.
- Used Flume to move the data from web logs onto HDFS.
- Used Pig to apply transformations, validations, cleaning and deduplication of data from sources.
- Actively monitored the Hadoop Cluster with Hortonworks distribution with HDP 2.4.
- Performed various configurations, which includes, networking and Iptables, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
- Worked on performing minor upgrade from HDP 2.2.2 to HDP 2.2.4
- Upgraded the Hadoop cluster from HDP 2.2 to HDP 2.4 and HDP 2.4 to HDP 2.5
- Integrated BI tool Tableau to run visualizations over the data.
- Solving hardware related Issues Ticket assessment on daily basis.
- Automate administration tasks using scripts and Job Scheduling using CRON.
- Provided 24 x 7 on call support as part of a scheduled rotation with other team members
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, OOZIE, SQOOP, AMBARI, STORM, AWS S3, EC2, IDENTITY ACCESS MANGEMENT, ZOOKEEPER, NIFI
Roles & Responsibilities:
- Maintained SQL Server DB; ensured accuracy of information and automated numerous functions.
- Fine tuning of database objects and server to ensure efficient data retrieval.
- Monitor and optimize system performance using SQL Profiler and DB Engine Tuning Advisor.
- Designed and implemented incremental and full back up policies and procedures.
- Created and implemented database design solutions in collaboration with programming team.
- Developed user defined functions and triggers to implement the requirements of the business.
- Performed database logical and physical design, maintenance, tuning, archiving, backups,replication, recovery, software upgrades, capacity planning and optimization for SQL Server database.
- Database consistency checks using DBCC utilities, Performance Baselining, Performance Tuning
- Production level support for onsite and offshore Clients.
- SQL Server Performance Dashboard Reports for Monitoring.
- Database Security management
Roles & Responsibilities:
- Developed stored procedures, functions and database triggers. Maintained referential integrity and implemented complex business logic.
- Involved in installation and configuration of SQL server 2005 with latest service packs.
- Created and executed SSIS packages to populate data from the various data sources.
- Created SSIS packages using SSIS designer for export heterogeneous data from OLEDB Source (Oracle), Excel spreadsheet to SQL Server 2005.
- Migrated DTS packages to SSIS packages and modified those packages.
- Designed ETL packages dealing with different data source and loaded the data into target data sources by performing different kinds of transformations using SSIS.
- Experience in creating multiple reports (SSRS) in Drill mode using tables, crosstabs, and charts. Design, deployment and maintenance of various SSRS in SQL Server 2005.
- Designed and implemented parameterized and cascading parameterized reports using SSRS.
- Managed the security of servers, creating the new logins and users, changing roles of users.
- Involved in developing logical and physical model of database using Erwin.