We provide IT Staff Augmentation Services!

Senior Hadoop Administrator/developer Resume

5.00/5 (Submit Your Rating)

New York City, NY

SUMMARY:

  • 5+ years of experience in IT, which includes 4 years of experience in Hadoop environment as an administrator.
  • Hands on experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, SPARK, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
  • Installation and configuration of HADOOP cluster and maintenance, cluster monitoring. Troubleshooting.
  • Assisted in designing, developing and architecture of HADOOP ecosystem.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on VMWare, Amazon Web Services (AWS) using an EC2 instance and IBM Bluemix.
  • Worked at optimizing volumes and AWS EC2 instances and created multiple VPC instances.
  • Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • In depth knowledge and good understanding of Hadoop daemons: NameNode, DataNode, Secondary NameNode, Resource Manager, NodeManager.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Knowledge of multiple distributions/platforms (Apache, Cloudera, Hortonworks).
  • Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem. Also, handled importing of various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
  • Knowledge in Programming MapReduce, HIVE and PIG.
  • Knowledge in NoSQL databases like HBase and Cassandra.
  • Exposure in setting up data importing and exporting tools such as Sqoop from RDBMS to HDFS.
  • Advanced knowledge in Fair and Capacity schedulers and configuring schedulers in cluster.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
  • Supported technical team for automation, installation and configuration tasks.
  • Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
  • Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux.
  • Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
  • Experience in providing security for Hadoop Cluster with Kerberos.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
  • Experience in HDFS data storage and support for running MapReduce jobs.
  • Working knowledge on Oozie, a workflow scheduler to manage the jobs that run on PIG, HIVE and sqoop.
  • Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving and leadership skills.
  • Research oriented, motivated, proactive. Self-starter with strong technical, analytical skills.

TECHNICAL SKILLS:

Hadoop Tools: IBM BigInsights (4.2), Cloudera Manager (CDH5), Hortonworks HDP 2.x, Ambari, HDFS, Oozie, Hive, Pig, Sqoop, Flume, spark, Kafka.

Other Utility Tools: winscp, putty

Operating System: Windows, UNIX, RHEL, CentOS

Databases: Oracle 11g, MySQL, MS SQL Server, NoSQL Databases HBase

Scripting: Linux Shell Scripting, HTML Scripting

Programming: C, Java, MapReduce, SQL, HIVE, PIG.

PROFESSIONAL EXPERIENCE:

Senior Hadoop Administrator/Developer

Confidential, New York City, NY

Responsibilities:

  • Installing and configuring the IBM BigInsights Hadoop cluster for the Development and Qa environment.
  • Preparing the shell-scripts to create the Local and HDFS file-system.
  • Creating the Encryption zones to enable Transparent Data Encryption (TDE) in production Environment.
  • Creating users and groups using user management tool (FreeIPA).
  • Enabling the Kerberos in the cluster and generating the keytabs for the process users and adding it to the user’s bash.
  • Setting the umask for users in both Local RHEL and also in HDFS.
  • Creating Hive Databases/Schemas and tables and storage based authentication for Hive and Impersonation.
  • Hive Trustore set up for beeline connectivity.
  • Synchronizing the hive tables with BigSql using HCatalog and querying the tables using Data Server Manager (DSM).
  • Setting proper ACL’s on the both Local and HDFS file-system to prevent access for unwanted users and groups.
  • Creating Capacity-Scheduler YARN queues and sharing the percent of resources between each queue.
  • Validating the final production cluster setup in IBM Bluemix cloud environment.
  • Automating the data fetching accelerators from multiple data sources/servers using oozie workflows.
  • Involved in Built and Deployment of applications in Production cluster.
  • Writing oozie workflows for the jobs, and scheduling the jobs with defined frequency and automating the entire process.
  • Checking logs to figure out the issues of failed jobs and clearing logs.
  • Monitoring the jobs in production and troubleshooting the failed jobs and configuring e-mail notification for the failed jobs using SendGrid.
  • Performance tuning of the cluster.
  • Migration of setup from IBM BigInsights 4.1 cluster to 4.2 cluster and replicating the cluster settings.
  • Listing the pre-defined alerts in the cluster and setting the e-mail notification for the alerts, which are in high priority.

Environment: RHEL, IBM BigInsights (4.2), MapReduce, HIVE, PIG, Oozie, HCatalog, BigSQL, DataServerManager, Kerberos, KNOX, SQOOP.

Senior Hadoop Administrator

Confidential, Bethpage, NY

Responsibilities:

  • Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
  • Hands on experience on Cloudera Upgrade from CDH 5.3 to CDH 5.4.
  • Good experience on cluster audit findings and tuning configuration parameters.
  • Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
  • Good experience with Hadoop Ecosystem components such as Hive, HBase, Sqoop, Oozie.
  • Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
  • Help design of scalable Big Data clusters and solutions.
  • Commissioning and Decommissioning Nodes from time to time.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Involved in converting HIVE/SQL queries into spark transformations using Spark RDDs and Scala.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Implemented Rack Awareness for data locality optimization.
  • Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
  • Implemented Name node backup using NFS.
  • Work with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
  • Production support responsibilities include cluster maintenance.

Environment: RHEL, CDH 5.4, HDFS, HUE, Oozie, HIVE, Sqoop, Zookeeper, Spark, kafka, Unix scripts, YARN, Capacity Scheduler, Kerberos, Oracle, MySQL, Ganglia, Nagios.

Senior Hadoop Administrator

Confidential, Palo Alto, CA

Responsibilities:

  • Deployed a Hadoop cluster using Hortonworks distribution HDP integrated with Nagios and Ganglia.
  • Monitored workload, job performance and capacity planning using Ambari.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Performed operating system installation, Hadoop version updates using automation tools.
  • Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Installed, Configured and maintained HBASE.
  • Designed the authorization of access for the Users using SSSD and integrating with Active Directory.
  • Integrated all the clusters Kerberos with Company’s Active Directory and created USERGROUPS and PERMISSIONS for authorized access in to the cluster.
  • Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Configured Oozie for workflow automation and coordination.
  • Implemented rack aware topology on the Hadoop cluster.
  • Implemented Kerberos security in all environments.
  • Implemented Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
  • Defined file system layout and data set permissions.
  • Good experience in troubleshoot production level issues in the cluster and its functionality.
  • Backed up data on regular basis to a remote cluster using distcp.
  • Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
  • Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.

Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, ECLIPSE, HORTONWORKS AMBARI, WINSCP, PUTTY.

Hadoop Administrator/Developer

Confidential, San Antonio, TX

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Deployed Hadoop Cluster in the following modes.
  • Implemented Name Node backup using NFS. This was done for High availability.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Created local YUM repository for creating and updating packages.
  • Implemented Rack awareness for data locality optimization.
  • Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Installed Ambari on existing Hadoop cluster.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured and deployed hive Meta store using MySQL and thrift server.
  • Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Developed Application components API’s using core Java.
  • Worked with support teams to resolve performance issues.

Environment: Hadoop 1x, Hive, Pig, HBASE, Ambari, Sqoop and Flume, NFS, MySQL, winscp, putty.

Linux/Hadoop Administrator

Confidential, San Diego, CA

Responsibilities:

  • Involved in Installation and maintenance Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase.
  • Pro-actively monitored systems and services, architecture design and implementation of Hadoop Deployment, configuration management, backup and disaster recovery systems and procedures.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Responsible for gathering the requirements doing the analysis and formulating the requirements specifications with the consistent inputs/requirements.
  • Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Developed servlets and JSP as application controllers.
  • Design and developed HTML front end screens and validated forms using JavaScript.
  • Deployed web application and web Logic server.
  • Used JDBC for database connectivity.
  • Deployed necessary SQL queries for database transactions.
  • Involved in testing, implementation and documentation.
  • Front end was built with JSPs, JavaScript and HTML.
  • Integrated data from multiple data sources.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Hadoop Map reduce, HBase, Shell Scripting.

We'd love your feedback!