We provide IT Staff Augmentation Services!

Senior Hadoop Administrator Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • 7+ years of professional IT experience which includes proven experience in Hadoop Administration, deploying, maintaining, monitoring and upgrading Hadoop Clusters using Cloudera (CDH) and Hortonworks (HDP) Distributions.
  • Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE, PIG, SQOOP, FLUME AND OOZIE, HBASE) NoSQL Administration.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, RackSpace and Open Stack.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
  • Developed Splunk infrastructure and related solutions as per automation toolsets.
  • Prepared, arranged and tested Splunk search strings and operational strings.
  • Trained Splunk security team members for complex search strings and ES modules.
  • Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera, Hortonworks and MapR.
  • Experience with Configuring AWS EC2 instances, EMR cluster with S3 buckets, Auto - Scaling groups and CloudWatch.
  • Good Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.
  • Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
  • Installed, Configured and maintained HBASE.
  • Experienced in using REST APIs for accessing and analyzing the data in HBase tables from EMR cluster.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), SPARK (with Python and Scala), and custom MapReduce programs in Java.
  • Expertise at designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), SPARK (with Python and Scala), and custom MapReduce programs in Java.
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into Pig Latin.
  • Hands on Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experience in NoSQL Column-Oriented Databases like HBase, MongoDB, Cassandra and its Integration with Hadoop cluster.
  • Good knowledge in configuring management tools like CVS, VSS, SVN and GitHub and also experienced in using tools like wiki, confluence for documentation.
  • Configured cloud infrastructures utilizing AWS services (RDS, Dynamo DB, VPC, Route53, Cloud Formation)
  • Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
  • Knowledge of importing and exporting the data from Hadoop to RDBMS databases using Scoop.
  • Experienced assisting in creation of ETL processes for transformation of data sources from existing RDBMS systems.
  • Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
  • Experience in configuring Zookeeper to provide Cluster coordination services.
  • Loading logs from multiple sources directly into HDFS using tools like Flume.
  • Good experience in performing minor and major upgrades.
  • Experience in benchmarking, performing backup and recovery of Name node metadata and data residing in the cluster.
  • Implementation of one of the data source transformations in spark using Scala.
  • Cassandra Data Model and design to connect with spark.
  • Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
  • Adept at configuring Name Node High Availability.
  • Participated in development and execution of system and disaster recovery processes.
  • Worked with Puppet for application deployment.
  • Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
  • Experienced in Linux Administration tasks like IP Management (IP Addressing, Sub netting, Ethernet bonding and Static IP).
  • Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
  • Experience in deploying and managing the multi-node development, testing and production.
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing.
  • Principles, generating key tab file for each and every service and managing key tab using key tab tools.
  • Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
  • Good Experienced managing Linux platform servers.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS:

Hadoop Ecosystem: MapReduce, HDFS, HBase, Zookeeper, Flume, Sqoop, Hive, Pig, Oozie, Ambari, Gangila, Spark, Scala

Hadoop Management: Cloudera Manager, Apache Ambari, Ganglia, Nagios.

Hadoop Distributions: Cloudera (CDH4, CDH5), Hortonworks (HDP 2.2 to HDP 2.4).

Programming Languages: C, Core Java, Python, Unix Shell.

Analysis Tools/Reporting Tools: SAS Enterprise Miner Studio, Tableau, SSRS, Crystal Reports.

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL.

Operating System: Windows XP/Vista/7/8.x/10, Ubuntu, fedora, Debian, Red Hat Linux, CentOS, Mac.

Database: MySQL, PL/SQL, MS-SQL Server, Oracle 10g/11g.

Security: Kerberos, Knox, Ranger.

PROFESSIONAL EXPERIENCE:

Senior Hadoop Administrator

Confidential, Auburn Hills, MI

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Implementation of MapR (Zookeeper, CLDB, YARN, HDFS, Spark, Impala, MCS, Oozie, Hive, HBase).
  • Good knowledge on architecture, planning and preparing the nodes, data ingestion, disaster recovery, high availability, management and monitoring.
  • Day to day responsibilities includes solving Hadoop developer issues and providing instant solution to reduce the impact and documenting the same for future reference.
  • Scheduling and configuring backups/snapshots/mirroring to maintain the backup of cluster data.
  • Collaborating with application teams to perform installs components, Hadoop updates, patches, version upgrades when required.
  • Perform cluster validation and run various pre-install and post install tests.
  • Deploying Hadoop cluster, add/remove nodes, tracking of jobs, monitoring critical nodes of the cluster.
  • Used Hue to create, maintain, monitor and run various Hadoop jobs such as Hive queries and Oozie workflows.
  • Configuring splunk user role through LDAP integration.
  • Advanced Searching, Reporting and scheduling alerts with Splunk.
  • Configuring and clustering Splunk Indexers, Search heads.
  • Troubleshooting Splunk Forwarders on end machines.
  • Getting new users into splunk and assigning particular roles.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Screen Hadoop cluster job performances and capacity planning.
  • Monitor Hadoop cluster connectivity and security.
  • Setting up project and volume setups for new Hadoop projects.
  • Implementing SFTP for the projects to transfers data from external servers to Hadoop servers. Manage and review Hadoop log files.
  • Working with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
  • Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, user and group quota.
  • Scaling up and adding nodes to the cluster and validating the node configurations.
  • Designed and Implemented Icinga monitoring for complete Hadoop setup, including architecture planning, host-grouping and alert setup.
  • Migrated all Nagios alerts to Icinga alerts.
  • Automating manual tasks through cron jobs and automation.
  • Installation and administering Splunk for monitoring, analyzing and visualizing machine data.
  • Working with Splunk developers' team for getting Hadoop dashboards created with various metrics.
  • Attending weekly management calls for reviewing missed SLA's to identify the issue and provide solutions on time meeting SLA's.

Environment: MCS, HDFS, Yarn, HBase, Pig, Spark, Oozie, Impala, Hue, Splunk, Linux, OMSA, Nagios and Icinga, Hadoop HDP 2.4.2 Hive1.2.1, HBase1.1.2, Flume1.5.2, MapReduce, Sqoop1.4.6, Kafka, Nagios, Shell Script, Oozie 4.2.0, Zookeeper 3.4.6, Ambari 2.2.1, AWS EMR

Senior Hadoop Administrator

Confidential, Arlington, VA

Responsibilities:

  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Installed and configured Horton Works Data Platform.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
  • Responsible for HBase REST server administration, backup and recovery.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Configure Splunk for log Monitoring, log rotation, activity monitoring.
  • Configure remote access to splunk and send CLI to remote server.
  • Migrated Splunk configuration file to multiple remote servers.
  • Enabled Log debugging in Splunk.
  • Experienced in Troubleshooting Splunk search quotas, monitor Inputs, WMI Issues, Splunk crash logs and Alert scripts.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they could read the data to HDFS without any issues.
  • Involved in upgrading Hadoop Cluster from HDP 1.3 to HDP 2.0.
  • Experience in configuring Java components using Spark.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability. Supported Data Analysts in running MapReduce Programs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Responsible for deploying patches and remediating vulnerabilities.
  • Experience in setting up Test, QA, and Prod environment.
  • Involved in loading data from UNIX file system to HDFS.
  • Created root cause analysis (RCA) efforts for the high severity incidents.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Handled importing data from various data sources, performed transformations.
  • Coordinating with On-call Support if human intervention is required for problem solving.
  • Documenting the procedures performed for the project development.
  • Work with various Admin teams (Teradata, UNIX & Informatica) to migrate code from one environment to another environment.
  • Coordinate with QA team during testing phase.
  • Provide application support to production support team.
  • Assigning tasks to offshore team and coordinate with them in successful completion of deliverables.

Environment: NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Exadata Machines X2/X3, Big Data Cloud era CDH Apache Hadoop, Toad, MYSQL plus, Oracle Enterprise Manager (OEM), RMAN, Shell Scripting, Golden Gate, RedHat/Suse Linux, EM Cloud Control, cloudera 4.3.2, HDFS, Hive, Sqoop, Zookeeper and HBase, HDFS Map Reduce, Pig Hive HBase Flume Sqoop.

Hadoop Administrator

Confidential, Dallas, TX

Responsibilities:

  • Responsible for architecting the Hadoop environment and integrating with other components.
  • Responsible for installing and upgrading the Hadoop environment and integrating with other components.
  • Responsible for day-to-day activities which includes HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup and restoring, capacity planning.
  • Worked with Hadoop developers and operating system admins in designing scalable supportable infrastructure for Hadoop.
  • Administered and supported distribution of Horton works.
  • Responsible for Operating system and Hadoop Cluster monitoring using tools like Nagios, Ganglia, and Cloudera Manager.
  • HA Implementation of Name node replication to avoid single point of failure.
  • Involved in troubleshooting issues on the Hadoop ecosystem, understanding of systems capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.
  • Involved in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Developed design for data migration form one cluster to another cluster using Distcp.
  • Responsible for scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler.
  • Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Puppet or Chef.
  • Performed system administration of UNIX servers by using Operating Systems of Solaris 2.7/8 Managing SUN Solaris, Compaq and Linux workstations and servers.
  • Installation of patches and other software packages Disk and File system management through Solstice Disk Suite on Solaris and other logical volume manager for other flavor of UNIX.
  • Built data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.

Environment: Hadoop distributions (CDH4.7, 5.3 and HDP 2.1), Horton Works, Red hat Linux 6.x,7.x Solaris 11, Shell Scripts, Nagios, Ganglia monitoring, Kerberos, Shell scripting, Python Scripting, Java, Hive, Pig, Scoop, Flume, HBase, Zookeeper, Oozie, YARN, Cloudera Manager, etc.

Hadoop Administrator

Confidential, TX

Responsibilities:

  • Worked in Huge Cluster on maintaining over 400 nodes with in High availability environment.
  • Worked on Hortonworks Distribution on major contributors to Apache Hadoop.
  • Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring and Troubleshooting and Transform data from RDBMS to HDFS.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Using Hadoop cluster as a staging environment for the data from heterogeneous sources in data import process.
  • Involved in Hadoop Cluster environment that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, Troubleshooting.
  • Configured High Availability on the name node for the Hadoop cluster - part of the disaster recovery roadmap.
  • Adding new Nodes to an existing cluster, recovering from a Name Node failure.
  • Decommissioning and commissioning the Node on running cluster.
  • Scripting Hadoop package installation and configuration to support fully automated deployments.
  • Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs for analyzing the data loaded in Hadoop from various data sources.
  • Used Sqoop to import data into HDFS from MySQL and Access databases and vice-versa.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Involved working in Database backup and recovery, Database connectivity and security.
  • Performed data completeness, correctness, data transformation and data quality testing using Instrumental in building scalable distributed data solutions using Hadoop eco-system.
  • Developed several advanced Map Reduce programs to process data files received.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Managing and reviewing Hadoop log files.
  • Creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Installing and configuring Hive and also written Hive UDFs experience in large scale data processing, on an Amazon EMR cluster.
  • Efficient to handled Hadoop admin and user command for administration.
  • Supported technical team members for automation, installation and configuration tasks.
  • Wrote shell scripts to monitor the health check of Hadoop Components services and respond accordingly to any warning or failure conditions.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Store unstructured data in semi structure on HDFS using HBase.
  • Used Change management and Incident management process following the company standards.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Installed, configured and maintained Hadoop cluster.
  • Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
  • Implemented new feature Smart sence in 2.3 HDP.
  • Involved in data migration from Oracle database to MongoDB.
  • Created HBase tables to store variable data formats of data coming from different applications.
  • Experience in managing and reviewing Hadoop log files.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Channelized Map Reduce outputs based on requirement using Partitioners.
  • Involved in estimation and setting-up Hadoop Cluster in Linux.
  • Wrote custom scripts in Shell to back up metadata periodically. Have been able to troubleshoot the cluster and configure the required properties in HDFS.
  • Coordinating with system owners, database team, storage team, network team, hardware team, and other teams as when required to resolve issues.
  • Providing root cause analysis for system-related failures and coordinating with required teams.
  • Performing planned and Break fix changes in Infrastructure.

Environment: Cent OS, CDH 5.4.5, Oracle, MS-SQL, Zookeeper3.4.6, Oozie 4.1.0, MapReduce, YARN 2.6.1, Nagios, REST APIs, Amazon web services, HDFS, Sqoop1.4.6, Hive 1.2.1, Pig 0.15.0

We'd love your feedback!