Sr. Cloudera/ Hadoop Administrator Resume California - Hire IT People

SUMMARY

8+ years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DWBI Consultant, DevOps and Software testing.
Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle, MS - Sql.
Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
Exposure to configuration of high availability cluster.
Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in data processing using Hive and Impala.
Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
Knowledge of data processing using Apache Spark.
Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
Working knowledge of Kafka for real-time data streaming and event based architecture.
Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
Backup configuration and recovery from a Name-node failure.
Experience in performance tuning of Hadoop cluster using various JVM metrics.
Experience in creating small Hadoop cluster (Sandbox) for POCs performed by development.
Knowledge of minor and major upgrades of Hadoop and eco-system components.
Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
ETL, Data Modeling, Data Analytics for Risk & Compliance Domains.
Managed users, groups, roles and policies using IAM.
Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
Created internal HIPAA website for centralized access to policies, procedures, forms and other HIPAA.
Hands on experience in deploying AWS services using Terraform.
Coordinated with UNIX system admin to add and provision new server for Hadoop cluster.
Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
Hands on experience in Linux admin activities on RHEL & Ubuntu.
Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
Handled security permissions for users on linux, HDFS and Hue.
Experience in Black Box Testing.
Expertise in Writing, Reviewing and Executing Test Cases.
Conceptual Knowledge in Load testing, Performance testing, and Stress Testing.
Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.

TECHNICAL SKILLS

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10

Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.

Other Tools: GitHub, Informatica 8.6, Data stage,Maven, Puppet, Hippa, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

PROFESSIONAL EXPERIENCE

Confidential, California

Sr. Cloudera/ Hadoop Administrator

Responsibilities:

Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Monitoring and support through Nagios and Ganglia.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 250+ servers and involved in developing manifests.
Developed Chef Cookbooks to manage systems configuration.
Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
Configured AWS IAM and Security Groups.
Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
I used Docker (DCE) to launch Yarn container into docker containers and deploying Hadoop and Spark clusters using Docker containers.
Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
Responsible for copying 210 TB of Hbase table from Production to DR cluster.
Created SOLR collection and replicas for data indexing.
Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
Upgraded Ambari Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
Implemented Cluster Security using Kerberos and HDFS ACLs.
Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
Requirement Analysis, effort estimation and formalization of change requests.
Responsible for the execution of Test Cases and prepare test logs.
Investigate the root cause of Critical and P1/P2 tickets.

Environment: Cloudera CDH 5.12.0/5.4.5 , Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Phython, Java, Puppet/Chef, Redhat/Suse Linux, LDAP/AD, Windows 2003/2008 Unix, NOSQL, Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Shell Scripting, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.

Hadoop Administrator

Confidential

Responsibilities:

Analyzed the business requirements of the project by studying the Business Requirement Specification document.
Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Installed single Node machines for stake holders with Hortonworks HDP Distribution.
Worked on a live 52 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.2.9, HDP 2.3, HDP 2.4.3.
Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
Adding new Data Nodes when needed and running balancer.
As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
Configured the Kerberos and installed MIT ticketing system.
Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
Installing and configuring CDAP, an ETL tool in the development and Production clusters.
Integrated CDAP with Ambari to for easy operations monitoring and management.
Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
Responsible for building scalable distributed data solutions using Hadoop.
Worked on name node recovery, capacity planning, and slots configuration.
Responsible in setting log retention policies and setting up of trash interval time.
Cluster maintenance as well as commission and decommission of nodes.
Done major and minor upgrades to the Hadoop cluster.
Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Migrated Hive QL queries on structured data into Spark QL to improve performance.
Involved in improving the performance and optimization of the existing algorithms using Spark.
Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.

Environment: Hortonworks, Ambari 2.2, HDFS, MapReduce, Yarn, Hive, PIG, Zookeeper, TEZ, MongoDB, MYSQL, Jenkins and RHEL.

Hadoop Administrator

Confidential

Responsibilities:

Installed and configured Cloudera CDH4.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
Load data from various data sources into HDFS using Flume.
Experience in Installing & supporting & administrating Centos 5.7, 6.2 64 bit Operating system.
Worked on Cloudera to analyze data present on top of HDFS.
Worked extensively on Hive and PIG.
Worked on large sets of structured, semi-structured and unstructured data.
Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
Coordinated with technical team for production deployment of software applications for maintenance.
Ability to Write Python Scripting For Monitoring Purposes.
Good knowledge on reading data from Cassandra and also writing to it.
Provided operational support services relating to Hadoop infrastructure and application installation.
Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
Handled the imports and exports of data onto HDFS using Flume and Sqoop.
Supported technical team members in management and review of Hadoop log files and data backups.
Participated in development and execution of system and disaster recovery processes.
Formulated procedures for installation of Hadoop patches, updates and version upgrades.
Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
Set up automated processes to send alerts in case of predefined system and application level issues.
Set up automated processes to send notifications in case of any deviations from the predefined Resource utilization.

Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase.

We provide IT Staff Augmentation Services!

Sr. Cloudera/ Hadoop Administrator Resume

CaliforniA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship