We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

TX

SUMMARY

  • Strong exposure in Bigdata architecture and effectively managed and monitored the Hadoop eco systems.
  • Build, deploy and management of large scale Hadoop based data Infrastructure.
  • Capacity planning and Architecture setup for Bigdata applications.
  • Strong exposure in Automation of maintenance tasks in Bigdata environment through Cloudera Manager API.
  • Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce job failures.
  • Extensive hands on experience in writing complex MapReduce jobs, Pig Scripts and Hive data modeling.
  • Expertise in troubleshooting complex system issues such as high - load, memory and CPU usage and provide solutions based on the root cause.
  • Configured Resource management in Hadoop through dynamic resource allocation.
  • Maintenance and Management of 300+ nodes Hadoop environment with 24x7 on-call support.
  • Configured YARN queues - based on Capacity Scheduler for resource management
  • Collected logs data from various sources and integrated in to HDFS using Flume.
  • Extensively implemented POC's on migrating to Spark-Streaming to process the live data
  • Having good understanding of Hortonworks (HDP) and Ambari tool.
  • Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
  • Architected, Designed and maintained high performing ELT/ETL Processes.
  • Knowledge of job workflow scheduling and monitoring tools like oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra.
  • Good knowledge in writing Spark application using Python and Java.
  • Very good Experience in creating build scripts using Maven and Ant.
  • Strong command over relational databases: MySQL, Oracle, SQL Server and MS Access.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Planned, tracked and released program enhancements and bugs in JIRA.
  • Having good knowledge of Oracle9i, 10g, 11g as Database and excellent in writing the SQL queries and scripts.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
  • Experience in Building S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
  • Experience working with Splunk application, both administration and searching/dashboarding development.
  • Ability to handle a team of developers and co-ordinate smooth delivery of the project.

TECHNICAL SKILLS

Big Data Ecosystem: Hortonworks, Cloudera, HDFS, HBase, Hadoop MapReduce, Zookeeper, Yarn, HiveSpark, Impala, Pig, Sqoop, Flume, Oozie.

Database: MySQL, NoSQL, Oracle.

Scripting Languages: Python, UNIX shell scripting

Operating Systems: RHEL, CENTOS, UNIX, LINUX, VMware, Windows.

Hadoop Distributions: Cloudera Manager, Ambari

Cloud Computing: AWS, Azure.

ETL Tools: Informatica, Talend

Testing Tools: HP QC, Jira, OneJira.

No SQL Databases: Hbase, Cassandra

PROFESSIONAL EXPERIENCE

Confidential, TX

Hadoop Administrator

Responsibilities:

  • CDH upgrade to 5.14 and worked on solving the post upgrade issues.
  • Users had issues connecting to hue after 5.14 upgrade, so we established connections to hive using external applications like oracle sql developer etc.
  • Applied patches to Hadoop eco system as sentry ACL’s were not in sync and were losing permissions on /user/hive/warehouse.
  • Migration of code from SVN to Bit bucket repository.
  • Worked on Automation of Manual code elevation process.
  • Jenkins jobs setup for all the Manual deployment activities performed.
  • Worked on setting up Git branching strategy for Automating Deployment across environments to follow SDLC standards.
  • Designed and wrote shell-scripts for automating deployments using GIT and storing the releases in Nexus Repositories.
  • Configuring resource pools for various teams to uniformly use cluster resources.
  • Developed Bash shell-scripts to automate routine activities like disk space usage, Kerberos ticket renewals etc.
  • Setup of specific hive replication alerts using API to application teams which are not on Cloudera manager.
  • Used informatica ETL tools like Powercenter designer, Powercenter worflow manager, Informatica Developer for the development activity.
  • Manage Impala scripts to develop the Marketing application.
  • Supporting around 100 developers with their issues for different teams.
  • Provided 24x7 production support for almost 2 weeks per month.
  • Installed Anaconda python in the environment and was able to configure Jupiter and made easy access for developers to use it.
  • Have Experience with scheduling jobs using CTRL-M Jobs scheduler.
  • Worked on developing Merge script of HDFS files which is very useful for query optimization of hive/impala and as well saving space on HDFS.
  • Responsible for troubleshooting issues in the execution of MapReduce and Spark jobs by inspecting and reviewing log files.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, Setting up Kerberos principals and testing HDFS, Hive, Impala and MapReduce access for the new users.
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Experience with Control-M for scheduling jobs for workflow automation.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
  • Documenting all the changes and POC steps and posting them on to confluence.

Environment: Unix, Linux, HDFS, Map Reduce, Spark, Hive, NFS4, Sqoop, CDH, Oracle 10g/11g, Control-M, JIRA, One Jira, Servicenow, Informatica, Shell Scripting, Apache Hadoop, MYSQL.

Confidential, New York

Hadoop Admin

Responsibilities:

  • Worked on setting up new CDH Hadoop cluster for POC purpose and installed third party tools.
  • Strong exposure in Configuration management tools like Ansible for configuration deployment.
  • Strong exposure in Automation of maintenance tasks in Bigdata environment through Cloudera Manager API.
  • Exposure to Cloud based Hadoop deployment using AWS and built Hadoop clusters using EC2 and EMR.
  • Dev cluster migration from on premise to AWS EMR and benchmark the performance in EMR.
  • MYSQL replication is configured for High availability and used as external database for CDH services.
  • Install and configured tools like SAS Viya and Securonix in Hadoop environment.
  • Configured High Availability for Hadoop services and setting up Load Balancers for Bigdata services.
  • Responsible for maintaining 24x7 production CDH hadoop clusters running spark, hbase, hive, MapReduce with over 300 nodes with multiple petabytes of data storage.
  • Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
  • Changing the configurations based on the requirements of the users for the better performance of the jobs.
  • Installed and configured application performance management tool like unravel and integrated with CDH Hadoop Cluster.
  • Setup automation scripts to spin and add a new Virtual Edgenode to Hadoop cluster for customers.
  • Management of CDH cluster with LDAP and Kerberos integrated.
  • Automated scripts for on board access to new users to Hadoop applications and setup Sentry Authorization.
  • Expertise in troubleshooting complex Hadoop job failures and provide solution.
  • Assist with design of core scripts to automate SPLUNK maintenance and alerting tasks. Support SPLUNK on UNIX, Linux and Windows-based platforms. Assist with automation of processes and procedures
  • Worked with application teams to install Hadoop updates, patches, version upgrades as required.

Environment: Hadoop 2.0.0, MapReduce, HDFS, LINUX, Hue, YARN, Kerberos, Sentry, AWS, SAS Viya, Securonix, Splunk, Unravel, Ldap, ServiceNow.

Confidential, MI

Jr. Hadoop Administrator

Responsibilities:

  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Designing and creating stories for the development and testing of the application.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Cloudera Distribution.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 4.0 and migrating servers across ESX hosts.
  • Worked with Nagios, Ganglia and Cloudera Manager to monitor Hadoop cluster.
  • Worked with REST API to ingest the data from a third-party vendor.
  • Production experience in large environments using configuration management tools like Chef and Ansible
  • Supporting Chef Environment with 100+ servers and involved in developing manifests.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users,
  • Setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Used Informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases.
  • Involved in writing OOZIE jobs for workflow automation.
  • Monitoring cluster health status on daily basis, tuning system performance related configuration parameters,backing up configuration xml files.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Supported Data Analysts in running MapReduce Programs.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Coordinating with On-call Support if human intervention is required for problem solving

Environment: Windows 2000/2003, Unix, Linux, HDFS, Map Reduce, Hive, HBase, Sqoop, CDH, Oracle 9i/10g/11g RAC with Solaris/red hat, Ganglia, Nagios, JIRA, Shell Scripting, Apache Hadoop, Cassandra, MYSQL.

Hire Now