Sr. Cloudera/ Hadoop Administrator Resume

SUMMARY

8+ years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DWBI Consultant, DevOps and Software testing.
Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle, MS - Sql.
Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
Exposure to configuration of high availability cluster.
Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
Upgraded CDH versions from 4.5 to 5.3, 5.3 to 5.7 and 5.7 to 5.9.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in data processing using Hive and Impala.
Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
Knowledge of data processing using Apache Spark.
Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
Working knowledge of Kafka for real-time data streaming and event based architecture.
Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
Backup configuration and recovery from a Name-node failure.
Experience in performance tuning of Hadoop cluster using various JVM metrics.
Experience in creating small Hadoop cluster (Sandbox) for POCs performed by development.
Knowledge of minor and major upgrades of Hadoop and eco-system components.
Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
ETL, Data Modeling, Data Analytics for Risk & Compliance Domains.
Managed users, groups, roles and policies using IAM.
Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
Created internal HIPAA website for centralized access to policies, procedures, forms and other HIPAA.
Hands on experience in deploying AWS services using Terraform.
Coordinated with UNIX system admin to add and provision new server for Hadoop cluster.
Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
Handled security permissions for users on linux, HDFS and Hue.
Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
Hands on experience in Linux admin activities on RHEL & Ubuntu.
Conceptual Knowledge in Load testing, Performance testing, and Stress Testing.
Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.

TECHNICAL SKILLS

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Manager, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10

Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.

Other Tools: GitHub, Informatica 8.6, Data stage,Maven, Puppet, Hippa, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

PROFESSIONAL EXPERIENCE

Sr. Cloudera/ Hadoop Administrator

Confidential

Responsibilities:

Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Monitoring and support through Nagios and Ganglia.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Created MapR DB tables and involved in loading data into those tables.
Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
Cloudera Navigator installation and configuration using Cloudera Manager.
Cloudera RACK awareness and JDK upgrade using Cloudera manager.
Sentry installation and configuration for Hive authorization using Cloudera manager.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD Confidential an Enterprise level.
Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
Configured AWS IAM and Security Groups.
Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
Responsible for copying 210 TB of Hbase table from Production to DR cluster.
Created SOLR collection and replicas for data indexing.
Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
Implemented Cluster Security using Kerberos and HDFS ACLs.
Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption Confidential Rest).
Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
Investigate the root cause of Critical and P1/P2 tickets.

Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.

Hadoop Administrator

Confidential

Responsibilities:

Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Installed single Node machines for stake holders with Hortonworks HDP Distribution.
Worked on a live 110 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
Implemented MapR token based security.
Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
Adding new Data Nodes when needed and running balancer.
Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
Configured the Kerberos and installed MIT ticketing system.
Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
Installing and configuring CDAP, an ETL tool in the development and Production clusters.
Integrated CDAP with Ambari to for easy operations monitoring and management.
Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
Responsible for building scalable distributed data solutions using Hadoop.
Migrated Hive QL queries on structured data into Spark QL to improve performance.
Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.

Environment:, Over 110 nodes, Approximately 5 PB of data, Hortonworks, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Storm, Cobbler.

Hadoop Administrator

Confidential

Responsibilities:

Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
Load data from various data sources into HDFS using Flume.
Worked on Cloudera to analyze data present on top of HDFS.
Restoring and Migrating Cloudera using Cloudera Manager Tools.
Worked on large sets of structured, semi-structured and unstructured data.
Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
Handled the imports and exports of data onto HDFS using Flume and Sqoop.
Migrating Namenode from one server to another server.
Hive backup and Disaster recovery using Cloudera backup tools.
HDFS data backup and Disaster recovery using Cloudera BDR.
Supported technical team members in management and review of Hadoop log files and data backups.
Formulated procedures for installation of Hadoop patches, updates and version upgrades.

Environment: HDFS, Cloudera, MapReduce, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship