Sr. Cloudera/ Hadoop Administrator Resume
SUMMARY
- 8+ years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DWBI Consultant, DevOps and Software testing.
- Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
- Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle, MS - Sql.
- Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
- Exposure to configuration of high availability cluster.
- Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
- Upgraded CDH versions from 4.5 to 5.3, 5.3 to 5.7 and 5.7 to 5.9.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in data processing using Hive and Impala.
- Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
- Knowledge of data processing using Apache Spark.
- Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
- Working knowledge of Kafka for real-time data streaming and event based architecture.
- Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
- Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
- Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
- Backup configuration and recovery from a Name-node failure.
- Experience in performance tuning of Hadoop cluster using various JVM metrics.
- Experience in creating small Hadoop cluster (Sandbox) for POCs performed by development.
- Knowledge of minor and major upgrades of Hadoop and eco-system components.
- Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
- ETL, Data Modeling, Data Analytics for Risk & Compliance Domains.
- Managed users, groups, roles and policies using IAM.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
- Created internal HIPAA website for centralized access to policies, procedures, forms and other HIPAA.
- Hands on experience in deploying AWS services using Terraform.
- Coordinated with UNIX system admin to add and provision new server for Hadoop cluster.
- Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
- Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
- Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
- Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Handled security permissions for users on linux, HDFS and Hue.
- Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
- Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Hands on experience in Linux admin activities on RHEL & Ubuntu.
- Conceptual Knowledge in Load testing, Performance testing, and Stress Testing.
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
TECHNICAL SKILLS
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Manager, Hortonworks
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Talend, Informatica, Tableau
Databases: Oracle … DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10
Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.
Other Tools: GitHub, Informatica 8.6, Data stage,Maven, Puppet, Hippa, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic
PROFESSIONAL EXPERIENCE
Sr. Cloudera/ Hadoop Administrator
Confidential
Responsibilities:
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Monitoring and support through Nagios and Ganglia.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Created MapR DB tables and involved in loading data into those tables.
- Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
- Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
- Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
- Cloudera Navigator installation and configuration using Cloudera Manager.
- Cloudera RACK awareness and JDK upgrade using Cloudera manager.
- Sentry installation and configuration for Hive authorization using Cloudera manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD Confidential an Enterprise level.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Configured AWS IAM and Security Groups.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
- Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Created SOLR collection and replicas for data indexing.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption Confidential Rest).
- Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Investigate the root cause of Critical and P1/P2 tickets.
Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.
Hadoop Administrator
Confidential
Responsibilities:
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Installed single Node machines for stake holders with Hortonworks HDP Distribution.
- Worked on a live 110 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
- Implemented MapR token based security.
- Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
- Adding new Data Nodes when needed and running balancer.
- Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Configured the Kerberos and installed MIT ticketing system.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Responsible for building scalable distributed data solutions using Hadoop.
- Migrated Hive QL queries on structured data into Spark QL to improve performance.
- Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
Environment:, Over 110 nodes, Approximately 5 PB of data, Hortonworks, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Storm, Cobbler.
Hadoop Administrator
Confidential
Responsibilities:
- Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
- Load data from various data sources into HDFS using Flume.
- Worked on Cloudera to analyze data present on top of HDFS.
- Restoring and Migrating Cloudera using Cloudera Manager Tools.
- Worked on large sets of structured, semi-structured and unstructured data.
- Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
- Handled the imports and exports of data onto HDFS using Flume and Sqoop.
- Migrating Namenode from one server to another server.
- Hive backup and Disaster recovery using Cloudera backup tools.
- HDFS data backup and Disaster recovery using Cloudera BDR.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
Environment: HDFS, Cloudera, MapReduce, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting.