- Over 9 years of IT experience including 4 + years in Big Data Technologies.
- Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Ranger, Yarn, Zookeeper, Spark and Oozie.
- Expertise in AWS services such as EC2, Simple Storage Service (S3), Autoscaling, EBS, and Glacier.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts. Experience in analyzing data using HiveQL, Pig Latin. Experience in Ansible and related tools for configuration management.
- Experience in task automation using Oozie, cluster co - ordination through Pentaho and Map Reduce job scheduling using Fair Scheduler. Worked on both Hadoop distributions: Cloudera and Hortonworks.
- Experience in performing minor and major upgrades and applying patches for Ambari and Cloudera Clusters.
- Extensive experience in installation, configuration, maintenance, design, implementation, and support on Linux. Experience in spinning Hive server2 and Impala daemons as required.
- Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
- Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
- Capability to configure scalable infrastructures for HA (High Availability) and disaster recovery.
- Experience in developing and scheduling ETL workflows, data scrubbing and processing data in Hadoop using Oozie.
- Implementing AWS architectures for web applications.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Experienced with machine learning algorithm such as logistic regression, KNN, SVM, random forest, neural network, linear regression, lasso regression and k-means. .Experience in Setting up Data Ingestion tools like Flume, Sqoop, and NDM. Experience in balancing the cluster after adding/removing nodes or major data cleanup.
- General Linux system administration including design, configuration, installs, automation.
- Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata.
- Experience in setting up Name Node high availability for major production cluster.
- Experience in designing Automatic failover control using zookeeper and quorum journal node.
- Experience in creating, building and managing public and private cloud Infrastructure.
- Experience in working with different file formats and compression techniques in Hadoop
- Experience in analyzing existing Hadoop cluster, Understanding the performance bottlenecks and providing the performance tuning solutions accordingly. Experience on Oracle, MongoDB, AWS Cloud, Greenplum.
- Experience in working large environments and leading the infrastructure support and operations.
- Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance. Experience in configuring Zookeeper to coordinate the servers in clusters.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Experienced on supporting Production clusters on-call support and troubleshooting issues within window to avoid any delays. Storage/Installation, LVM, Linux Kickstart, Solaris Volume Manager, Sun RAID Manage.
- Expertise in Virtualizations System Administration of VMware EESX/EESXi, VMware Server, VMware Lab Manager, Vcloud, Amazon EC2 & S3 web services.
- Excellent knowledge of in NOSQL databases like HBase, Cassandra. Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Involved in 24X7 Production support, Build and Migration Assignments.
- Good Working Knowledge on Linux concepts and building servers ready for Hadoop Cluster setup.
- Extensive experience on monitoring servers with Monitoring tools like Nagios, Ganglia about Hadoop services and OS level Disk/memory/CPU utilizations.
- Closely worked with Developers and Analysts to address project requirements. Ability to effectively manage time and prioritize multiple projects.
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, HDP 2.4, 2.6, CDH 5.x
Devops Tools:: Jenkins, GitMonitoring Tools: Cloudera Manager, Ambari, Ganglia
Scripting Languages: Shell Scripting
Programming Languages: C, Java, SQL, and PL/SQL. Python.
Front End Technologies: HTML, XHTML, XML.
Application Servers: Apache Tomcat, WebLogic Server, Web sphere
Cloud Platforms: Amazon Web Services (AWS), Microsoft Azure and Google Cloud
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.
NoSQL Databases: HBase, Cassandra, MongoDB.
Operating Systems: Linux, UNIX, Mac OS X 10.9.5, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.
Security: Kerberos, Ganglia and Nagios
Confidential - CA
- Supporting 170+ servers and 30+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Install and Manage HDP Hortonworks DL and DW components.
- Worked on Hadoop Hortonworks (HDP 2.6) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, Pig, HBase, Cloudera, Sqoop, Nagios, Spark, Ambari Metrics, Zookeeper, Falcon and Oozie etc.)
- Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Rendered L3/L4 support services for BI users, Developers and Tableau team through Jira ticketing system.
- One of the key engineers in Aetna's HDP web engineering team, Integrated Systems engineering ISE.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability. Worked on installing Kafka on Virtual Machine.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MaprFS.
- Worked on configuring F5 LB for Ranger, NiFi and Oozie.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
- Worked closely with System Administrators, BI analysts, developers, and key business leaders to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
- Worked with NoSQL databases like HBase and MongoDB for POC purpose.
- Configured, installed, monitored MapR Hadoop on 10 AWS EC2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster.
- Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
- Hortonworks Ambari, Apache Hadoop on RedHat, and Centos as data storage, retrieval, and processing systems.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID's using keytab scripts.
- Create multiple groups and set permission polices for various groups in AWS.
- Performed a Major upgrade in production environment from HDP 2.4.2 to HDP 2.6. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances.
- Creating data lineage using Talend TMM.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
- Installed OLAP software Atscale on its designated edge node server.
- Implemented dual data center set up for all Cassandra cluster. Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
- Conducted cluster sizing, tuning, and performance benchmarking on a multi-tenant OpenStack platform to achieve desired performance metrics.
- Good knowledge on providing solution to the users who encountered java exception and error problems while running the data models in SAS script and Rscript. Good understanding on forest data models.
- Worked on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as HBase, Cassandra.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
- Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari, Ranger, NiFi, Atlas, Grafana, KNOX and Zeppelin.
- Involved in updating scripts and step actions to install ranger plugins.
- Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developer’s/business users for day-to-day activities.
- Create queues and allocated the clusters resources to provide the priority for jobs in hive.
- Implementing the SFTP for the projects to transfer data SCP from External servers to servers. Experienced in managing and reviewing log files. Involved in scheduling Oozie workflow engine to run multiple Hive, Sqoop and pig jobs.
Environment: HDP 2.6, Jenkins, Git, Spark, Map Reduce, Talend, Hive, Pig, Zookeeper, Nifi, Kafka, HBase, VMware ESX Server, Flume, Sqoop, Oozie, Kerberos, Sentry, AWS, Cent OS.
Hadoop Operations Administrator
Confidential - New York
- Currently working as Hadoop administrator in MapR Hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Day to day responsibilities includes solving Hadoop developer issues and providing instant solution to reduce the impact and documenting the same and preventing future issues.
- Experience on MapR patching and upgrading the cluster with proper strategies.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
- Changing the configurations based on the requirements of the users for the better performance of the jobs
- Worked on configuration management tools like stack Iq to maintain central and pushing the configurations to the overall cluster for all Config relates Hadoop files like mapred-site.xml, pools.xml.hdfs-site.xml.
- Experienced in Setting up the project and volume setups for the new Hadoop projects.
- Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
- Implementing the SFTP for the projects to transfer data from External servers to Hadoop servers.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Experienced in managing and reviewing Hadoop log files.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers
- Helping the users in production deployments throughout the process.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Monitored multiple Hadoop clusters environments using Ambari. Monitored workload, job performance and capacity planning using MapR control systems.
Environment: MapR, Map Reduce, Hive, Pig, Zookeeper, Nifi, Kafka, HBase, VMware ESX Server, Flume, Sqoop, Oozie, Kerberos, Sentry, AWS, Cent OS.
Hadoop Operations Administrator
Confidential - Bloomfield, CT
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 150 nodes ranges from POC (Proof-of-Concept) to PROD clusters.
- Involved in the requirements review meetings and collaborated with business analysts to clarify any specific scenario.
- Worked on Hortonworks Distribution which is a major contributor to Apache Hadoop.
- Experience in Installation, configuration, deployment, maintenance, monitoring and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production using Ambari front-end tool and Scripts.
- Experience with implementing High Availability for HDFS, Yarn, Hive and HBase.
- Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
- Created databases in MySQL for Hive, Ranger, Oozie, Dr. Elephant and Ambari.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
- Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
- Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories.
- Complete end to end design and development of Apache Nifi flow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
- Installed and configured Ambari metrics, Grafana, Knox, Kafka brokers on Admin Nodes.
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Commissioning and Decommissioning Nodes from time to time.
- Component unit testing using Azure Emulator.
- Implemented NameNode automatic failover using zkp controller.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Introduced Smart Sense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
- Configured the Kerberos and installed MIT ticketing system.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
- Experienced in defining job flows. Ranger security enabled on all the Clusters.
- Experienced in managing and reviewing Hadoop log files
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Installed Grafana for metrics analytics & visualization suite.
- Monitoring local file system disk space usage, CPU using Ambari.
- Installed various services like Hive, HBase, Pig, Oozie, and Kafka.
- Production support responsibilities include cluster maintenance.
- Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Environment: HDP, Ambari, HDFS, MapReduce, Yarn, Hive, NiFi, Flume, PIG, Zookeeper, TEZ, Oozie, MYSQL, Puppet, and RHEL
Linux / Hadoop Administrator
- Worked on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
- Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
- Developed custom writable MapReduce JAVA programs to load web server logs into HBase using flume.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
- Experience on MapR patching and upgrading the cluster with proper strategies.
- Developed entire data transfer model using Sqoop framework.
- Configured flume agent with flume syslog source to receive the data from syslog servers.
- Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
- Installed and managed multiple Hadoop clusters - Production, stage, development.
- Installed and managed production cluster of 150 Node cluster with 4+ PB.
- Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
- Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.
- Designed the Cluster tests before and after upgrades to validate the cluster status.
- Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
- Performed AWS EC2 instance mirroring, WebLogic domain creations and several proprietary middleware Installations.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
- Documented and prepared run books of systems processes and procedures for future references.
- Performed Benchmarking and performance tuning on the Hadoop infrastructure.
- Automated data loading between production and disaster recovery cluster.
- Migrated hive schema from production cluster to DR cluster.
- Worked on Migrating application by doing Poc's from relation database systems.
- Helping users and teams with incidents related to administration and development.
- Onboarding and training on best practices for new users who are migrated to our clusters.
- Guide users in development and work with developers closely for preparing a data lake.
- Tested and Performed enterprise wide installation, configuration and support for Hadoop using MapR Distribution.
- Migrated data from SQL Server to HBase using Sqoop.
- Scheduled data pipelines for automation of data ingestion in AWS.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Monitored multiple Hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
- Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
Environment: Red Hat Linux (RHEL 3/4/5), Solaris, Logical Volume Manager, Sun & Veritas Cluster Server, Global File System, Red Hat Cluster Servers.
Linux System Administrator
- Installation and configuration of Solaris 9/10 and Red Hat Enterprise Linux 5/6 systems.
- Involved in building servers using jumpstart and kickstart in Solaris and RHEL respectively.
- Installation and configuration of RedHat virtual servers using ESXi 4/5 and Solaris servers (LDOMS) using scripts and Ops Center.
- Performed package and patches management, firmware upgrades and debugging.
- Addition and configuration of SAN disks for LVM on Linux, and Veritas Volume Manager and ZFS on Solaris LDOMs.
- Configuration and troubleshooting of NAS mounts on Solaris and Linux Servers.
- Configuration and administration of ASM disks for Oracle RAC servers.
- Analyzing and reviewing the System performance tuning and Network Configurations.
- Managed Logical volumes, Volume Groups, using Logical Volume Manager.
- Troubleshooting and analysis of hardware and failures for various Solaris servers (Core dump and log file analysis)
- Performed configuration and troubleshooting of services like NFS, FTP, LDAP and Web servers.
- Installation and configuration of VxVM, Veritas file system (VxFS).
- Management of Veritas Volume Manager (VxVM), Zetabyte File System (ZFS) and Logical Volume Manager
- Involved in patching Solaris and RedHat servers.
- Worked NAS and SAN concepts and technology.
- Configured and maintained Network Multipathing in Solaris and Linux.
- Configuration of Multipath, EMC power path on Linux, Solaris Servers.
- Provided production support and 24/7 support on rotation basis.
- Performed POC on Tableau which includes running load tests and system performance with large amount of data.
Environment: Solaris 9/10/11, RedHat Linux 4/5/6, AIX, Sun Enterprise Servers E5500/E4500, Sun Fire V 1280/480/440 , Sun SPARC 1000, HP 9000K, L, N class Server, HP & Dell blade servers, IBM RS/6000,.
Linux System Administrator
- Installing and configuring Windows Server- 2003/2008, Windows-XP, 7.
- Installing and configuring Firewall Server.
- Creating backup and restore the data base data.
- Diagnosing hardware/software problems and provide solutions.
- Installing and updating of different Antivirus software.
- Installation and maintenance of different printers.
- Planned and Scheduled Backup's of Server and System.
- Security software to prevent access of pen drive and Encrypting and Decrypting of Data.
- Installing and Maintaining of LAN Switches and user end networks.
- Supervising Systems AMC activities.
- Troubleshooting of NFS servers, NFS Clients in Auto Mount Environment.
- Scripting for job automation using Shell and Perl scripting.
- Installed and configured Sun Cluster 3.0 & 3.1
- Solaris Volume Manager/Solstice Disk Suite for Disk Device Management.
- Handling CPU panic, memory problems and other hardware failures with coordination of vendors.
- Configured FTP, Telnet, FTP, SSH, ip tables and SUDO upgrades for the servers.
- Configured EMC/SAN disks in Solaris Servers and HP.
Environment: Windows Server- 2003 & 2008, Windows-XP & 7, CentOS and Ubunto.