- 8+ years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DWBI Consultant, DevOps and Software testing.
- Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
- Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle, MS - Sql.
- Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
- Exposure to configuration of high availability cluster.
- Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in data processing using Hive and Impala.
- Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
- Knowledge of data processing using Apache Spark.
- Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
- Working knowledge of Kafka for real-time data streaming and event based architecture.
- Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
- Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
- Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
- Backup configuration and recovery from a Name-node failure.
- Experience in performance tuning of Hadoop cluster using various JVM metrics.
- Experience in creating small Hadoop cluster (Sandbox) for POCs performed by development.
- Knowledge of minor and major upgrades of Hadoop and eco-system components.
- Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
- ETL, Data Modeling, Data Analytics for Risk & Compliance Domains.
- Managed users, groups, roles and policies using IAM.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
- Created internal HIPAA website for centralized access to policies, procedures, forms and other HIPAA.
- Hands on experience in deploying AWS services using Terraform.
- Coordinated with UNIX system admin to add and provision new server for Hadoop cluster.
- Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
- Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
- Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
- Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
- Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
- Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Hands on experience in Linux admin activities on RHEL & Ubuntu.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Handled security permissions for users on linux, HDFS and Hue.
- Experience in Black Box Testing.
- Expertise in Writing, Reviewing and Executing Test Cases.
- Conceptual Knowledge in Load testing, Performance testing, and Stress Testing.
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Talend, Informatica, Tableau
Databases: Oracle … DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10
Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.
Other Tools: GitHub, Informatica 8.6, Data stage,Maven, Puppet, Hippa, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic
Sr. Cloudera/ Hadoop Administrator
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Monitoring and support through Nagios and Ganglia.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
- Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
- Developed Chef Cookbooks to manage systems configuration.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Configured AWS IAM and Security Groups.
- Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
- I used Docker (DCE) to launch Yarn container into docker containers and deploying Hadoop and Spark clusters using Docker containers.
- Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Created SOLR collection and replicas for data indexing.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Upgraded Ambari Ambari 2.2.0, Ambari 220.127.116.11. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
- Requirement Analysis, effort estimation and formalization of change requests.
- Responsible for the execution of Test Cases and prepare test logs.
- Investigate the root cause of Critical and P1/P2 tickets.
Environment: Cloudera CDH 5.12.0/5.4.5 , Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Phython, Java, Puppet/Chef, Redhat/Suse Linux, LDAP/AD, Windows 2003/2008 Unix, NOSQL, Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Shell Scripting, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.
- Analyzed the business requirements of the project by studying the Business Requirement Specification document.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Installed single Node machines for stake holders with Hortonworks HDP Distribution.
- Worked on a live 52 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
- Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.2.9, HDP 2.3, HDP 2.4.3.
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
- Adding new Data Nodes when needed and running balancer.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
- Configured the Kerberos and installed MIT ticketing system.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on name node recovery, capacity planning, and slots configuration.
- Responsible in setting log retention policies and setting up of trash interval time.
- Cluster maintenance as well as commission and decommission of nodes.
- Done major and minor upgrades to the Hadoop cluster.
- Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Migrated Hive QL queries on structured data into Spark QL to improve performance.
- Involved in improving the performance and optimization of the existing algorithms using Spark.
- Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
Environment: Hortonworks, Ambari 2.2, HDFS, MapReduce, Yarn, Hive, PIG, Zookeeper, TEZ, MongoDB, MYSQL, Jenkins and RHEL.
- Installed and configured Cloudera CDH4.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
- Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
- Load data from various data sources into HDFS using Flume.
- Experience in Installing & supporting & administrating Centos 5.7, 6.2 64 bit Operating system.
- Worked on Cloudera to analyze data present on top of HDFS.
- Worked extensively on Hive and PIG.
- Worked on large sets of structured, semi-structured and unstructured data.
- Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
- Coordinated with technical team for production deployment of software applications for maintenance.
- Ability to Write Python Scripting For Monitoring Purposes.
- Good knowledge on reading data from Cassandra and also writing to it.
- Provided operational support services relating to Hadoop infrastructure and application installation.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
- Handled the imports and exports of data onto HDFS using Flume and Sqoop.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Participated in development and execution of system and disaster recovery processes.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
- Set up automated processes to send alerts in case of predefined system and application level issues.
- Set up automated processes to send notifications in case of any deviations from the predefined Resource utilization.
Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase.