Hadoop Admin Resume
San Francisco, CA
SUMMARY:
- Experience which includes proven experience in Hadoop Administration on Cloudera (CDH), Hortonworks (HDP) Distributions, Vanilla Hadoop, MapR and experience in AWS, Kafka, Elasticsearch, Devops and Linux Administration.
- Good experience as Software Engineer with IT Technologies and good working knowledge in Java and BIG Data Hadoop Ecosystems.
- Good experience in Hadoop infrastructure which include Map reduce, Hive, Oozie, Sqoop, HBase, Pig, HDFS, Yarn, Spark. Impala configuration projects in direct client facing roles.
- Good knowledge on Data Structure, Algorithms, Object Oriented Design and Data Modelling
- Good knowledge in Core Java programming using Collections, Generics, Exception handling, multithreading.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
- Experience integration of Kafka with Spark for real time data processing and Spark Streamin
- Good knowledge on implementation and design of big data pipelines.
- Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Knowledge in implementing ETL/ELT processes with MapReduce, PIG, Hive
- Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Knowledge on MS SQL Server … and Oracle … Knowledge in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Strong knowledge in Software Development Life Cycle (SDLC)
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
- Strong understanding in Agile and Waterfall SDLC methodologies.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good Knowledge in creating reports using Qlik View/ Qlik Scenes.
- Experienced in installing, configuring and administrating Hadoop Clusters.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoniex, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH).
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Servers: Web logic server, WebSphere and JBoss.
Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.
Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.
Database: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
Processes: Incident Management, Release Management, Change Management.
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Hadoop Admin
Responsibilities:
- Administration & Monitoring Hadoop.
- Worked on Hadoop Upgradation from 4.5 to 5.2.
- Monitor Hadoop cluster job performance and capacity planning.
- DevOps configuration management with Ansible.
- Installed Ansible 2.3.0 in Production Environment
- Upgraded Elastic search from 5.3.0 to 5.3.2 following the rolling upgrade process and using ansible to deploy new packages in Prod Cluster.
- Implemented and managed for Devops infrastructure architecture, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
- Release Process Implementation like DevOps and Continuous Delivery methodologies to existing Build & Deployment Strategies.
- Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue)
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Started using Apache NiFi to copy the data from local file system to HDP
- Worked with Nifi for managing the flow of data from source to HDFS.
- Experience in job workflow scheduling and scheduling tools like Nifi.
- Ingested data into HDFS using Nifi with different processors, developed custom Input Adaptors
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI .
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Successfully Generated consumer group lags from kafka using their API Kafka- Used for building real-time data pipelines between clusters.
- Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka
- Integrated Apache Kafka for data ingestion
- Experience integration of Kafka with Spark for real time data processing and Spark Streaming.
- Removing from monitoring of particular security group nodes in nagios in case of retirement.
- Designed and implemented by configuring Topics in new Kafka cluster in all environment.
- Successfully secured the Kafka cluster with Kerberos Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
- Responsible for managing and scheduling jobs on Hadoop Cluster
- Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories
- Performed dynamic updates of Hadoop Yarn and MapReduce memory settings
- Worked with DBA team to migrate Hive and Oozie meta store Database from MySQL to RDS
- Worked with fair and capacity schedulers, creating new queues, adding users to queue, Increase mapper and reducers capacity and also administer view and submit Mapreduce jobs
- Experience in Administration/Maintenance of source control management systems, such as GIT and GITHUB knowledge
- Hands on experience in installing and administrating CI tools like Jenkins
- Experience in integrating Shell scripts using Jenkins
- Installed and configured an automated tool Puppet that included the installation and configuration of the Puppet master, agent nodes and an admin control workstation.
- Working with Modules, Classes, Manifests in Puppet.
- Experience in creating Docker images
- Used containerization technologies like Docker for building clusters for orchestrating containers deployment.
- Operations - Custom Shell scripts, VM and Environment management.
- Experience in working with Amazon EC2, S3, Glaciers
- Create multiple groups and set permission polices for various groups in AWS
- Experience in creating life cycle policies in AWS S3 for backups to Glaciers
- Experience in maintaining, executing, and scheduling build scripts to automate DEV/PROD builds.
- Configured Elastic Load Balancers with EC2 Auto scaling groups.
- Created monitors, alarms and notifications for EC2 hosts using Cloudwatch.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and configuring launched instances with respect to specific applications
- Worked with IAM service creating new IAM users & groups, defining roles and policies and Identity providers
- Experience in assigning MFA in AWS using IAM and s3 buckets
- Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances.
- AmazonRoute53 to oversee DNS zones and furthermore give open DNS names to flexible load balancers IP.
- Using default and custom VPCs to create private cloud environments with public and private subnets
- Loaded data from Oracle, MS SQL Server, MySQL, Flat File database into HDFS, HIVE
- Fixed Namenode partition failed, fsimage not rotated, MR job failed with too many fetch failures and troubleshooting common Hadoop cluster issues
- Implemented manifest files in puppet for automated orchestration of Hadoop and Cassandra clusters
- Maintaining Github repositories for Configuration Management
- Configured distributed monitoring system Ganglia for Hadoop clusters
- Managing cluster coordination services through Zoo Keeper
- Configured and deployed Namenode High Availability Hadoop cluster with SSL and kerberoized
- Deal with the several services restart and killing the process with Pid to clear the alert
- Monitoring Log files of several services, clear files incase of Diskspace issues on share this nodes
- Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
- Currently working as admin in Hortonworks (HDP 2.2.4.2) distribution for 4 clusters ranges from POC to PROD.
- Cluster Administration, releases and upgrades Managed multiple Hadoop clusters with the highest capacity of 7 PB (400+ nodes) with PAM Enabled Worked on Hortonworks Distribution.
- Experience in Python Scripting.
- Orchestrated hundreds of Sqoop scripts, python scripts, Hive queries using Oozie workflows and sub- workflows.
- Used Change management and Incident management process following organization guidelines.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Extensive experience in cluster planning, installing, configuring and administrating Hadoop cluster for major Hadoop distributions like Cloudera and Hortonworks.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks
- Hands on experience using Cloudera and Hortonworks Hadoop Distributions.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
Confidential, Oak Book, IL
Hadoop Administrator
Responsibilities:
- Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform, provide training to users to make Hadoop usability simple and updating them for best practices.
- Implemented an instance of Zookeeper for Kafka Brokers.
- Implementing Hadoop Security on Hortonworks Cluster using Kerberos and Two-way SSL
- Experience with Hortonworks, Cloudera CDH4 and CDH5 distributions
- Contributed to building hands-on tutorials for the community to learn how to use Hortonworks Data Platform (powered by Hadoop) and Hortonworks Dataflow (powered by NiFi) covering categories such as Hello World, Real-World use cases, Operations.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Responsible for building a cluster on HDP 2.3. With Hadoop 2.2.0 using Ambari.
- Responsible for implementation and ongoing administration of Hadoop administration.
- Involved in Performance testing of the Production Cluster using TERAGEN, TERASORT and TERAVALIDATE.
- Implemented commissioning and decommissioning of data nodes.
- Involved in Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in managing and reviewing Hadoop log files.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map.
- Managed 350+ Nodes HDP 2.3 cluster with 4 peta bytes of data using Ambari 2.0 and Linux Cent OS 7.
- Implemented Fair scheduler on the Resource Manager to allocate the fair amount of resources to small jobs.
- Installed and configured Hive Using Hive Metastore, Hiveserver2 and HCatalog.
- Created method of process for the Kerberos KDC cluster Setup
Confidential, St. Louis, MO
Bigdata Operations Engineer - Consultant
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Using Hadoop cluster as a staging environment for the data from heterogeneous sources in data import process
- Configured High Availability on the name node for the Hadoop cluster - part of the disaster recovery roadmap.
- Configured Ganglia and Nagios to monitor the cluster and on-call with EOC for support.
- Involved working on Cloud architecture.
- Performed both Major and Minor upgrades to the existing cluster and also rolling back to the previous version.
- Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed Ambari server on the clouds
- Setup security using Kerberos and AD on Hortonworks clusters
- Designed and allocated HDFS quotas for multiple groups.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
- Involved working in Database backup and recovery, Database connectivity and security.
Confidential, Chicago, IL
Hadoop Admin/ Linux Administrator
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions
- Experienced in Installation and configuration Cloudera CDH4 in testing environment.
- Resolved tickets submitted by users, P1 issues, troubleshoot the errors, resolving the errors.
- Balancing HDFS manually to decrease network utilization and increase job performance.
- Responsible for building scalable distributed data solutions using Hadoop.
- Done major and minor upgrades to the Hadoop cluster.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Monitoring the System activity, Performance, Resource utilization.
- Develop and optimize physical design of MySQL database systems.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Performed Red Hat Package Manager (RPM) and YUM package installations, patch and other server management.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
Confidential
Linux/Unix Administrator
Responsibilities:
- Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
- Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
- Performed administration and monitored job processes using associated commands
- Manages systems routine backup, scheduling jobs and enabling cron jobs
- Maintaining and troubleshooting network connectivity
- Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
- Configures DNS, NFS, FTP, remote access, and security management, Server hardening
- Installs, upgrades and manages packages via RPM and YUM package management
- Logical Volume Management maintenance
- Experience administering, installing, configuring and maintaining Linux
- Creates Linux Virtual Machines using VMware Virtual Center
- Administers VMware Infrastructure Client 3.5 and vSphere 4.1
- Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
- Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
- Supporting infrastructure environment comprising of RHEL and Solaris.
- Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
- Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
- Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
- Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.
