We provide IT Staff Augmentation Services!

Hadoop Admin Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Around 7 years of Hands on experience in deploying and managing multi - node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
  • Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
  • Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HBASE, ZOOKEEPER, KAFKA, Spark, Spark2,NIFI) using Cloud era Manager and Ambary.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.
  • Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
  • Performed administrative tasks on Hadoop Clusters using Cloudera/Hortonworks.
  • Hands on experience in Hadoop Clusters using Hortonworks (HDP), Cloudera (CDH3, CDH4), oracle big data and Yarn distributions platforms.
  • Experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experience in administering Tableau and Green Plum databases instances in various environments.
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
  • Hands on experience in Hadoop Clusters using Hortonworks (HDP), Cloudera (CDH3, CDH4), oracle big data and Yarn distributions platforms.
  • Good experience in creating various database objects like tables, stored procedures, functions, and triggers using SQL, PL/SQL, and DB2.
  • Responsible for Configuring, Managing & Administering overall VPCs, EC2, RDS, CloudFront, CloudWatch. S3, ELB and also providing applications support for deployment with Chef on AWS Cloud.
  • Implemented OpenVPN solution to connect remote users to AWS VPC and on-premise DC, responsible for administering and maintaining it at all.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.Experience in managing the Hadoop MapR infrastructure with MCS.
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts
  • Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
  • Designing and implementing security for Hadoop cluster with Kerberos secure authentication.
  • Hands on experience on Nagios and Ganglia tool for cluster monitoring system.
  • Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Admin

Roles and Responsibilities:

  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
  • Developed custom aggregate functions using Spark SQL to create tables as per the data model and performed interactive querying
  • Experience developing iterative algorithms using Spark Streaming in Scala to build near real-time dashboards
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing.Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK. Maintained the architecture Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet and HDP 2.2.4.
  • Configured, installed, monitored MapR Hadoop on 10 AWS EC2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster.
  • Responsible for continuous monitoring and managing Elastic MapReduce ( EMR) cluster through AWS console.
  • Experience with AWS Cloud (EC2, S3 & EMR)
  • Deploy and monitor scalable infrastructure on Amazon web services ( AWS) & configuration management using puppet.
  • Experience AWS Cloud Formation to create instances of compute sources, EC2 data base instances to manage cloud for automation on these DB databases. Used Clear pass policy manager module as an instance on this cloud to deploy that configurable files to several nodes.
  • Integrated Attunity and Cassandra with CDH Cluster.
  • Experience in working with AWS (Amazon Web Services) like S3, EMR and EC2.
  • Working to implement MapR stream to facilitate Realtime data ingestion to meet business needs.
  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
  • Responsible for gathering the business requirements for the Initial POCs to load the enterprise data warehouse data to Greenplum databases.
  • Experience in managing multi-tenant Cassandra clusters on public cloud environment - Amazon Web Services (AWS)-EC2, Rackspace and on private cloud infrastructure - OpenStack cloud platform.
  • Expertise in RDBMS like MS SQL Server, MySQL, Greenplum and DB2
  • Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
  • Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
  • Managed Incident queue and resolved all Production Issues for P1, P2 and P3 tickets.
  • Worked on STAF Automation Framework using Selenium Web Driver, API Driver in converting P1, P2 and P3 test cases in Java.
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment.
  • Successfully Generated consumer group lags from kafka using their API Kafka- Used for building real-time data pipelines between clusters.
  • Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue)
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Started using Apache NiFi to copy the data from local file system to HDP
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Experience in job workflow scheduling and scheduling tools like Nifi.
  • Ingested data into HDFS using Nifi with different processors, developed custom Input Adaptors
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI .
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Converted complex Oracle stored procedures code to Spark and Hive using Python and Java.
  • Involved in the database migrations to transfer data from one database to other and complete virtualization of many client applications
  • Extensive experience working in Oracle DB2 SQL Server and My SQL database Scripting to deploy monitors checks and critical system admin functions automation
  • Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or Hbase tables.
  • Responsible for loading data les from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Experience on data analysis and developing scripts for pig and Hive.
  • Proficient in PLSQL programming in Oracle 11g, 10g, 9i including Oracle Architecture, Data Dictionary and DBMS Packages.
  • Experience with Data flow diagrams, Data dictionary, Database normalization theory techniques, Entity relation modeling and design techniques.
  • Hibernate Configuration files were written to connect Oracle database and fetch data
  • Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
  • Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
  • Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
  • Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
  • Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
  • Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
  • Experience in innovative, and where possible, automated approaches for system administration tasks.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
  • Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
  • Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
  • Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
  • Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
  • Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.

Hadoop Admin

Confidential - Oak Brook IL

Roles and Responsibilities:

  • Created stored procedures in MySQL Server to perform result-oriented tasks
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Experience with cloud AWS/ EMR, Cloudera Manager (also direct- Hadoop-EC2(non EMR))
  • Monitored workload, job performance and capacity planning using Cloudera Manager. Rack Aware Configuration and AWS working nature
  • Involved in Analyzing system failures, identifying root causes and recommended course of actions. Imported logs from web servers with Flume to ingest the data into HDFS.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations. Fine tuning hive jobs for optimized performance.
  • Working experience in supporting and deploying in an AWS environment. Using Flume and Spool directory loading the data from local system to Hdfs
  • Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
  • Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users. Responsible for Patching Linux Servers and applied patches to cluster. Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability. Also done major and minor upgrades to the Hadoop cluster.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
  • Experience in scripting languages python, Perl or shell script also.
  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
  • Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
  • Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
  • To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
  • Monitor Hadoop cluster job performance and capacity planning.
  • Built Cassandra Cluster on both the physical machines and on AWS.
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Currently working as admin in Hortonworks (HDP 2.2.4.2) distribution for 4 clusters ranges from POC to PROD.

Hadoop Admin

Confidential - San Jose, CA

Roles and Responsibilities:

  • Installation and configuration of Linux for new build environment.
  • Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
  • Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions
  • Experienced in Installation and configuration Cloudera CDH4 in testing environment.
  • Resolved tickets submitted by users, P1 issues, troubleshoot the errors, resolving the errors.
  • Validated web services manually and through groovy script automation using SOAP UI.
  • Implementing End to End automation tests by consuming the APIs of different layers.
  • Involved in using Postman tool to test SOA based architecture for testing SOAP services and REST API.
  • Used Maven to build and run the Selenium automation framework.
  • Framework used to send the automation reports over email.
  • Validated web services manually and through groovy script automation using SOAP UI.
  • Implementing End to End automation tests by consuming the APIs of different layers.
  • Balancing HDFS manually to decrease network utilization and increase job performance.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Done major and minor upgrades to the Hadoop cluster.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
  • Done stress and performance testing, benchmark for the cluster.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
  • Involved in estimation and setting-up Hadoop Cluster in Linux.
  • Prepared PIG scripts to validate Time Series Rollup Algorithm.
  • Responsible for support, troubleshooting of Map Reduce Jobs, Pig Jobs and maintaining Incremental Loads at daily, weekly and monthly basis.
  • Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
  • Channelized Map Reduce outputs based on requirement using Practitioners
  • Performed scheduled backup and necessary restoration.
  • Build and maintain scalable data using the Hadoop ecosystem and other open source components like Hive and HBase.

Linux/Unix Administrator

Confidential - Glendale, CA

Roles and Responsibilities:

  • Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
  • Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux.
  • Experience in writing Scripts in Bash for performing automation of various tasks.
  • Experience in writing Shell scripts using bash for process automation of databases, applications, backup and scheduling to reduce both human intervention and man hours.
  • Remote system administration via tools like SSH and Telnet
  • Extensive use of crontab for job automation.
  • Installed & Configured Selenium Web Driver, Test-NG, Maven tool and created Selenium automation scripts in java using Test-NG prior to next quarter release.
  • Developed Python Scripts (automation scripts) for stability testing.
  • Experience administering, installing, configuring and maintaining Linux
  • Creates Linux Virtual Machines using VMware Virtual Center dministers VMware Infrastructure Client 3.5 and Vsphere 4.1
  • Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
  • Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
  • Supporting infrastructure environment comprising of RHEL and Solaris.
  • Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
  • Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
  • Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
  • Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.

Linux/Unix Administrator

Confidential

Roles and Responsibilities:

  • Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
  • Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
  • Performed administration and monitored job processes using associated commands Manages systems routine backup, scheduling jobs and enabling cron jobs
  • Maintaining and troubleshooting network connectivity
  • Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
  • Configures DNS, NFS, FTP, remote access, and security management, Server hardening
  • Installs, upgrades and manages packages via RPM and YUM package management
  • Logical Volume Management maintenance
  • Experience administering, installing, configuring and maintaining Linux
  • Creates Linux Virtual Machines using VMware Virtual Center
  • Administers VMware Infrastructure Client 3.5 and vSphere 4.1
  • Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
  • Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.

We'd love your feedback!