We provide IT Staff Augmentation Services!

Sr. Cloudera/ Hadoop Administrator Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • 8+ years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DWBI Consultant, DevOps and Software testing.
  • Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
  • Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle, MS - Sql.
  • Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
  • Exposure to configuration of high availability cluster.
  • Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
  • Upgraded CDH versions from 4.5 to 5.3, 5.3 to 5.7 and 5.7 to 5.9.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Experience in data processing using Hive and Impala.
  • Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
  • Knowledge of data processing using Apache Spark.
  • Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
  • Working knowledge of Kafka for real-time data streaming and event based architecture.
  • Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
  • Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
  • Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
  • Backup configuration and recovery from a Name-node failure.
  • Experience in performance tuning of Hadoop cluster using various JVM metrics.
  • Experience in creating small Hadoop cluster (Sandbox) for POCs performed by development.
  • Knowledge of minor and major upgrades of Hadoop and eco-system components.
  • Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
  • ETL, Data Modeling, Data Analytics for Risk & Compliance Domains.
  • Managed users, groups, roles and policies using IAM.
  • Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
  • Created internal HIPAA website for centralized access to policies, procedures, forms and other HIPAA.
  • Hands on experience in deploying AWS services using Terraform.
  • Coordinated with UNIX system admin to add and provision new server for Hadoop cluster.
  • Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
  • Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
  • Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
  • Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
  • Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Handled security permissions for users on linux, HDFS and Hue.
  • Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
  • Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Hands on experience in Linux admin activities on RHEL & Ubuntu.
  • Conceptual Knowledge in Load testing, Performance testing, and Stress Testing.
  • Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.

TECHNICAL SKILLS

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Manager, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10

Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.

Other Tools: GitHub, Informatica 8.6, Data stage,Maven, Puppet, Hippa, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

PROFESSIONAL EXPERIENCE

Sr. Cloudera/ Hadoop Administrator

Confidential

Responsibilities:

  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Monitoring and support through Nagios and Ganglia.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Created MapR DB tables and involved in loading data into those tables.
  • Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
  • Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
  • Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
  • Cloudera Navigator installation and configuration using Cloudera Manager.
  • Cloudera RACK awareness and JDK upgrade using Cloudera manager.
  • Sentry installation and configuration for Hive authorization using Cloudera manager.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD Confidential an Enterprise level.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
  • Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Configured AWS IAM and Security Groups.
  • Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
  • Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Created SOLR collection and replicas for data indexing.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption Confidential Rest).
  • Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Investigate the root cause of Critical and P1/P2 tickets.

Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.

Hadoop Administrator

Confidential

Responsibilities:

  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Installed single Node machines for stake holders with Hortonworks HDP Distribution.
  • Worked on a live 110 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
  • Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
  • Implemented MapR token based security.
  • Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
  • Adding new Data Nodes when needed and running balancer.
  • Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Configured the Kerberos and installed MIT ticketing system.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Migrated Hive QL queries on structured data into Spark QL to improve performance.
  • Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.

Environment:, Over 110 nodes, Approximately 5 PB of data, Hortonworks, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Storm, Cobbler.

Hadoop Administrator

Confidential

Responsibilities:

  • Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
  • Load data from various data sources into HDFS using Flume.
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Restoring and Migrating Cloudera using Cloudera Manager Tools.
  • Worked on large sets of structured, semi-structured and unstructured data.
  • Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
  • Handled the imports and exports of data onto HDFS using Flume and Sqoop.
  • Migrating Namenode from one server to another server.
  • Hive backup and Disaster recovery using Cloudera backup tools.
  • HDFS data backup and Disaster recovery using Cloudera BDR.
  • Supported technical team members in management and review of Hadoop log files and data backups.
  • Formulated procedures for installation of Hadoop patches, updates and version upgrades.

Environment: HDFS, Cloudera, MapReduce, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting.

We'd love your feedback!