We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

El Segundo, CA

SUMMARY:

  • 8 years of professional IT experience which includesproven experience in Hadoop Administration on Cloudera (CDH), Hortonworks (HDP) Distributions, Vanilla Hadoop, MapR and 3+ year of experience in AWS, Kafka, ElasticSearch, Devops and Linux Administration.
  • Proficient with Shell, Python, Ruby, YAML, Groovy scripting languages & Terraform.
  • Configured Elastic Load Balancing(ELB) for routing traffic between zones, and used Route53 with failover and latency options for high availability and fault tolerance.
  • Site Reliability Engineering responsibilities for Kafka platform that scales 2 GB/Sec and 20 Million messages/sec.
  • Worked on analyzing Data with HIVE and PIG.
  • Combined views and reports into interactive dashboards in Tableau Desktop that were presented to Business Users, Program Managers, and End Users.
  • Used Bash and Python, including Boto3 to supplement automation provided by Ansible and Terraform for tasks such as encrypting EBS volumes backing AMIs and scheduling Lambda functions for routine AWS tasks
  • Experienced in authoring POM.xml files, performing releases with Maven release plugin, modernization of Java projects, and managing Maven repositories.
  • Configured Elastic Search for log collections and Prometheus & Cloudwatch for metric collections
  • Branching, Tagging, Release Activities on Version Control Tools: SVN, GitHub.
  • Implemented and managed for Devops infrastructure architecture, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
  • Experience Architecting, designing and implementing large scale distributed data processing applications built on distributed key value stores over Hadoop, Hbase, Hive, MapReduce, Yarn and other Hadoop ecosystem components Hue, Oozie, Spark, Sqoop, Pig and Zookeeper.
  • Expertise in Commissioned Data Nodes when data grew and Decommissioned when the hardware degraded.
  • Experience in Implementing High Availability of Name Node and Hadoop Cluster capacity planning, Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important and sensitive data residing on cluster.
  • Experience in creating S3 buckets and managed policies for S3 buckets and utilized S3 Buckets and Glacier for storage, backup and archived in AWS.
  • Experience in set up and maintenance of Auto - Scaling AWS stacks.
  • Team Player and self-starter possessing effective communication, motivation and organizational skills combined with attention to detail and business process improvements, hard worker with ability to meet deadlines on or ahead of schedules.

WORK EXPERIENCE:

Hadoop Administrator

Confidential, El Segundo, CA

Responsibilities:

  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Successfully secured the Kafka cluster with Kerberos
  • Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
  • Installed Kerberos for authentication of application and Hadoop service users.
  • Used Cron job to backup Hadoop Service databases to S3 buckets.
  • Supported technical team in management and review of Hadoop logs.
  • Assisted in creation of ETL processes for transformation of Data from Oracle and SAP to Hadoop Landing Zone.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
  • Used Cloudera Navigator for data governance: Audit and Linage.
  • Configured Apache Sentry for fine-grained authorization and role-based access control of data in Hadoop.
  • Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
  • Monitoring performance and tuning configuration of services in Hadoop Cluster.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Deployed Spark Cluster and other services in AWS using console.
  • Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it
  • Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
  • Integrated LDAP Configuration this includes integrating LDAP for securing Ambari servers and manage authorization and securing with permissions against users and Groups
  • Implemented KNOX, RANGER, Spark and Smart Sence in Hadoop cluster.
  • Installed HDP 2.6 in all environments
  • Installed Ranger in all environments for Second Level of security in Kafka Broker.
  • Involved in Data Ingestion Process to Production cluster.
  • Worked on Oozie Job Scheduler
  • Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format (Receiving gzip data compression Data and produce Avro Data into HDFS files).
  • Installed Docker for utilizing ELK, Influxdb, and Kerberos.
  • Involved in defining test automation strategy and test scenarios, created automated test cases, test plans and executed tests using Selenium WebDriver and JAVA. Architected Selenium framework which has integrations for API automation, database automation and mobile automation.
  • Executed and maintained Selenium test automation scriptb
  • Created Database on InfluxDB also worked on Interface, created for Kafka also checked the measurements on Databases
  • Created a Bash Scripting with Awk formatted text to send metrics to InfluxDB.
  • Enabled influxDB and Configured Influx database source into Grafana interface
  • Succeeded in deploying of ElasticSearch 5.3.0, Influx DB 1.2 on the Prod machine in a Docker container.
  • Created a Cron Job those will execute a program that will start the ingestion process. The Data is read in, converted to Avro, and written to the HDFS files
  • Designed Data Flow Ingestion Chart Process.
  • Set up a new Grafana Dashboard with real-time consumer lags in Dev, PP Cluster, pulling only consumer lags metric and sending them to influx DB (Via a script in Corntab) Worked on Database Schema DDL-oracle Schema issues at time of Ambari upgrade
  • Successfully Upgraded HDP 2.5 to 2.6 in all environment Software patches and upgrades.
  • Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage.
  • Deployed Data lake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users login.
  • Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
  • Worked on SNMP Trap Issues in Production Cluster. Worked on heap optimization and changed some of the configurations for hardware optimization.
  • Involved working in Production Ambari Views.
  • Implemented Rack Awareness in Production Environment.
  • Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fx for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
  • Worked on Nagios Monitoring tool.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Involved with Hortonworks Support team on Grafana consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP)
  • Successfully Generated consumer group lags from Kafka using their API
  • Installed and configured Ambari Log Search under the hood it will required a SolR Instance, that can collect and index all cluster generated logs in real time and display them in one interface.
  • Installed Ansible 2.3.0 in Production Environment
  • Worked on maintenance of Elastic search cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elastic search to write to multiple disk in same time and a segment of given Shard is written to the same disk
  • Upgraded Elastic search from 5.3.0 to 5.3.2 following the rolling upgrade process and using ansible to deploy new packages in Prod Cluster.
  • Successfully Made some visualization on Kibana
  • Also deployed Kibana with ansible and connected to Elastic search Cluster. Tested Kibana and ELK by creating a test index and injected sample data into it.
  • Successfully test Kakfa ACL's with anonymous users and with different hostnames.
  • Created HBase tables to store variable data formats of data coming from different applications.
  • Worked on Production Support Issues.

Hadoop ADMINISTRATOR,

Confidential - Pleasanton, CA

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop and Cloudera Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform, provide training to users to make Hadoop usability simple and updating them for best practices.
  • Implementing Hadoop Security on Hortonworks Cluster using Kerberos and Two-way SSL
  • Experience with Hortonworks, Cloudera CDH4 and CDH5 distributions
  • Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it
  • Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
  • Contributed to building hands-on tutorials for the community to learn how to use Hortonworks Data Platform (powered by Hadoop) and Hortonworks Dataflow (powered by NiFi) covering categories such as Hello World, Real-World use cases, Operations.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Managed 350+ Nodes CDH cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
  • Implemented Azure APIM modules for public facing subscription based authentication implemented Circuit Breaker for system fatal errors
  • Experience in creating and configuring Azure Virtual Networks (Vnets), subnets, DHCP address blocks, DNS settings, Security policies and routing.
  • Created Web App Services and deployed Asp.Net applications through Microsoft Azure Web App services.Creates Linux Virtual Machines using VMware Virtual Center.
  • Responsible for software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks& Cloudera)
  • Worked with application teams to install operating system, updates, patches, version upgrades as required.

Hadoop Administrator

Confidential, TX

Responsibilities:

  • Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Managed 350+ Nodes CDH cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
  • Monitored cluster for performance and, networking and data integrity issues.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Test strategize, test plan and test case creation in providing test coverage across various products, systems, and platforms.
  • Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
  • Install OS and administrated Hadoop stack with CDH (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
  • Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager.
  • Maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files

Buzz words: Hadoop, MapReduce, Hive, PIG, Sqoop, Spark, Oozie, Flume, HBase, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat.

Hadoop Administrator

Confidential, San Ramon, CA

Responsibilities:

  • Implement and test integration of BI (Business Intelligence) tools with Hadoop stack.
  • Monitor and manage 152 nodes Hadoop Cluster in production with 24x7 on-call support.
  • Automated scripts for on board access to new users to Hadoop applications and setup Sentry Authorization.
  • Completed Talend and Pentaho training and installed the tools for POC in R&D line.
  • Installed and configured Confluent Kafka in R&D line. Validated the installation with HDFS connector and Hive connectors.
  • Install and configure Data science tools like R, H20 and Dataiku for Analytics.
  • Part of POC for Oracle Golden gate big data connector to Hadoop.
  • Automation scripts/python code for better Hadoop monitoring and maintenance.
  • Proactively address Bigdata application performance issues for Business critical applications like Npevent (Network Platform events)
  • Capacity Planning for Bigdata platform.
  • Configure Cloudera Manager Triggers and Nagios plugins to automate alerts for critical Hadoop services.
  • Configured YARN Dynamic resource management and fair scheduler allocation for better resource management and allocation in Hadoop.
  • Added 76 new nodes to the existing Hadoop cluster and load balanced the cluster for data distribution across nodes.
  • Monitored Hadoop jobs running in Tidal and troubleshoot in case of failure.

Linux/Unix Administrator

Confidential, Palo Alto, CA

Responsibilities:

  • Installing, tuning, troubleshooting and patching of Red Hat Enterprise Linux servers.
  • Installation, configuration, and maintenance of web servers, application servers, and database servers on Linux Servers.
  • Implemented Bash, Perl, Python Scripting.
  • Scheduled various regular, periodic, future and queue tasks by using crontab.
  • Processes administration and management like monitoring, start/stop/kill various processes/sub processes.
  • Used Ansible server and workstation to manage and configure nodes.
  • Developed automation scripting in Python (core) using Ansible to deploy and manage Java.
  • Created and updated Ansible playbooks and files, and packages stored in the GIT repository.
  • Connected continuous integration system with GIT version control repository and continually build as the check-in's come from the developer. rovisioned and build CHP (compute host platform) Red hat Linux 6.1 virtual machines, these servers based cloud computing.
  • Automated Build and deployment to Wildly server using Jenkins and Rundeck.
  • Utilized Amazon Web Service (AWS) including EC2 in order to maintain, configure, and install/scale servers and systems in the Amazon Cloud - Datapipe Managed AWS.
  • Installed and configured application like NFS under VCS control for HA capability
  • Expanded file system using Logical Volume Manager.
  • Performed Disk Mirrors and RAID 0, 1 and 5 levels on several Linux servers.
  • File System Management on Sun Solaris & Linux environments
  • EMC disk configuration for Web, Application and Database servers
  • Performance Tuning and Troubleshooting Web/Application/Database Servers
  • Planned, scheduled and Implemented OS patches on both Solaris & Linux boxes as a part of proactive maintenance.

Linux/Unix Administrator

Confidential

Responsibilities:

  • Storage management using JBOD, RAID Levels 0, 1, Logical Volumes, Volume Groups and Partitioning.
  • Analyzed the Performance of the Linux System to identify Memory, disk I/O and network problem.
  • Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
  • Administration of RedHat4.x, 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts, Xen servers.
  • Logs & Resource Monitoring via Script on Linux Server.
  • Maintained and monitored Local area network and Hardware Support and Virtualization on RHEL server (Through Xen & KVM Server).
  • Administration of VMware virtual Linux server and resizing of LVM disk volumes as required.
  • Respond to all Linux systems problems 24x7 as a part of on call rotation and resolving them on a timely basis.

Hire Now