We provide IT Staff Augmentation Services!

Sr. Cloudera/ Hadoop Administrator Resume

CaliforniA

SUMMARY

  • 8+ years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DWBI Consultant, DevOps and Software testing.
  • Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
  • Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle, MS - Sql.
  • Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
  • Exposure to configuration of high availability cluster.
  • Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience in data processing using Hive and Impala.
  • Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
  • Knowledge of data processing using Apache Spark.
  • Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
  • Working knowledge of Kafka for real-time data streaming and event based architecture.
  • Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
  • Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
  • Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
  • Backup configuration and recovery from a Name-node failure.
  • Experience in performance tuning of Hadoop cluster using various JVM metrics.
  • Experience in creating small Hadoop cluster (Sandbox) for POCs performed by development.
  • Knowledge of minor and major upgrades of Hadoop and eco-system components.
  • Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
  • ETL, Data Modeling, Data Analytics for Risk & Compliance Domains.
  • Managed users, groups, roles and policies using IAM.
  • Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
  • Created internal HIPAA website for centralized access to policies, procedures, forms and other HIPAA.
  • Hands on experience in deploying AWS services using Terraform.
  • Coordinated with UNIX system admin to add and provision new server for Hadoop cluster.
  • Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
  • Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
  • Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
  • Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
  • Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
  • Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
  • Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Hands on experience in Linux admin activities on RHEL & Ubuntu.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Handled security permissions for users on linux, HDFS and Hue.
  • Experience in Black Box Testing.
  • Expertise in Writing, Reviewing and Executing Test Cases.
  • Conceptual Knowledge in Load testing, Performance testing, and Stress Testing.
  • Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.

TECHNICAL SKILLS

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10

Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.

Other Tools: GitHub, Informatica 8.6, Data stage,Maven, Puppet, Hippa, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

PROFESSIONAL EXPERIENCE

Confidential, California

Sr. Cloudera/ Hadoop Administrator

Responsibilities:

  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Monitoring and support through Nagios and Ganglia.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
  • Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
  • Developed Chef Cookbooks to manage systems configuration.
  • Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Configured AWS IAM and Security Groups.
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
  • I used Docker (DCE) to launch Yarn container into docker containers and deploying Hadoop and Spark clusters using Docker containers.
  • Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Created SOLR collection and replicas for data indexing.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
  • Requirement Analysis, effort estimation and formalization of change requests.
  • Responsible for the execution of Test Cases and prepare test logs.
  • Investigate the root cause of Critical and P1/P2 tickets.

Environment: Cloudera CDH 5.12.0/5.4.5 , Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Phython, Java, Puppet/Chef, Redhat/Suse Linux, LDAP/AD, Windows 2003/2008 Unix, NOSQL, Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Shell Scripting, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.

Hadoop Administrator

Confidential

Responsibilities:

  • Analyzed the business requirements of the project by studying the Business Requirement Specification document.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Installed single Node machines for stake holders with Hortonworks HDP Distribution.
  • Worked on a live 52 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
  • Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.2.9, HDP 2.3, HDP 2.4.3.
  • Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
  • Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
  • Adding new Data Nodes when needed and running balancer.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
  • Configured the Kerberos and installed MIT ticketing system.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on name node recovery, capacity planning, and slots configuration.
  • Responsible in setting log retention policies and setting up of trash interval time.
  • Cluster maintenance as well as commission and decommission of nodes.
  • Done major and minor upgrades to the Hadoop cluster.
  • Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Migrated Hive QL queries on structured data into Spark QL to improve performance.
  • Involved in improving the performance and optimization of the existing algorithms using Spark.
  • Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.

Environment: Hortonworks, Ambari 2.2, HDFS, MapReduce, Yarn, Hive, PIG, Zookeeper, TEZ, MongoDB, MYSQL, Jenkins and RHEL.

Hadoop Administrator

Confidential

Responsibilities:

  • Installed and configured Cloudera CDH4.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
  • Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
  • Load data from various data sources into HDFS using Flume.
  • Experience in Installing & supporting & administrating Centos 5.7, 6.2 64 bit Operating system.
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Worked extensively on Hive and PIG.
  • Worked on large sets of structured, semi-structured and unstructured data.
  • Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
  • Coordinated with technical team for production deployment of software applications for maintenance.
  • Ability to Write Python Scripting For Monitoring Purposes.
  • Good knowledge on reading data from Cassandra and also writing to it.
  • Provided operational support services relating to Hadoop infrastructure and application installation.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
  • Handled the imports and exports of data onto HDFS using Flume and Sqoop.
  • Supported technical team members in management and review of Hadoop log files and data backups.
  • Participated in development and execution of system and disaster recovery processes.
  • Formulated procedures for installation of Hadoop patches, updates and version upgrades.
  • Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
  • Set up automated processes to send alerts in case of predefined system and application level issues.
  • Set up automated processes to send notifications in case of any deviations from the predefined Resource utilization.

Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase.

Hire Now