Senior Hadoop Administrator/Developer Resume New York City, NY - Hire IT People

SUMMARY:

5+ years of experience in IT, which includes 4 years of experience in Hadoop environment as an administrator.
Hands on experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, SPARK, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
Installation and configuration of HADOOP cluster and maintenance, cluster monitoring. Troubleshooting.
Assisted in designing, developing and architecture of HADOOP ecosystem.
Hands on experience on configuring a Hadoop cluster in a professional environment and on VMWare, Amazon Web Services (AWS) using an EC2 instance and IBM Bluemix.
Worked at optimizing volumes and AWS EC2 instances and created multiple VPC instances.
Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
In depth knowledge and good understanding of Hadoop daemons: NameNode, DataNode, Secondary NameNode, Resource Manager, NodeManager.
Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
Knowledge of multiple distributions/platforms (Apache, Cloudera, Hortonworks).
Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem. Also, handled importing of various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
Knowledge in Programming MapReduce, HIVE and PIG.
Knowledge in NoSQL databases like HBase and Cassandra.
Exposure in setting up data importing and exporting tools such as Sqoop from RDBMS to HDFS.
Advanced knowledge in Fair and Capacity schedulers and configuring schedulers in cluster.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
Supported technical team for automation, installation and configuration tasks.
Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux.
Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
Experience in providing security for Hadoop Cluster with Kerberos.
Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
Experience in HDFS data storage and support for running MapReduce jobs.
Working knowledge on Oozie, a workflow scheduler to manage the jobs that run on PIG, HIVE and sqoop.
Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving and leadership skills.
Research oriented, motivated, proactive. Self-starter with strong technical, analytical skills.

TECHNICAL SKILLS:

Hadoop Tools: IBM BigInsights (4.2), Cloudera Manager (CDH5), Hortonworks HDP 2.x, Ambari, HDFS, Oozie, Hive, Pig, Sqoop, Flume, spark, Kafka.

Other Utility Tools: winscp, putty

Operating System: Windows, UNIX, RHEL, CentOS

Databases: Oracle 11g, MySQL, MS SQL Server, NoSQL Databases HBase

Scripting: Linux Shell Scripting, HTML Scripting

Programming: C, Java, MapReduce, SQL, HIVE, PIG.

PROFESSIONAL EXPERIENCE:

Senior Hadoop Administrator/Developer

Confidential, New York City, NY

Responsibilities:

Installing and configuring the IBM BigInsights Hadoop cluster for the Development and Qa environment.
Preparing the shell-scripts to create the Local and HDFS file-system.
Creating the Encryption zones to enable Transparent Data Encryption (TDE) in production Environment.
Creating users and groups using user management tool (FreeIPA).
Enabling the Kerberos in the cluster and generating the keytabs for the process users and adding it to the user’s bash.
Setting the umask for users in both Local RHEL and also in HDFS.
Creating Hive Databases/Schemas and tables and storage based authentication for Hive and Impersonation.
Hive Trustore set up for beeline connectivity.
Synchronizing the hive tables with BigSql using HCatalog and querying the tables using Data Server Manager (DSM).
Setting proper ACL’s on the both Local and HDFS file-system to prevent access for unwanted users and groups.
Creating Capacity-Scheduler YARN queues and sharing the percent of resources between each queue.
Validating the final production cluster setup in IBM Bluemix cloud environment.
Automating the data fetching accelerators from multiple data sources/servers using oozie workflows.
Involved in Built and Deployment of applications in Production cluster.
Writing oozie workflows for the jobs, and scheduling the jobs with defined frequency and automating the entire process.
Checking logs to figure out the issues of failed jobs and clearing logs.
Monitoring the jobs in production and troubleshooting the failed jobs and configuring e-mail notification for the failed jobs using SendGrid.
Performance tuning of the cluster.
Migration of setup from IBM BigInsights 4.1 cluster to 4.2 cluster and replicating the cluster settings.
Listing the pre-defined alerts in the cluster and setting the e-mail notification for the alerts, which are in high priority.

Environment: RHEL, IBM BigInsights (4.2), MapReduce, HIVE, PIG, Oozie, HCatalog, BigSQL, DataServerManager, Kerberos, KNOX, SQOOP.

Senior Hadoop Administrator

Confidential, Bethpage, NY

Responsibilities:

Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
Hands on experience on Cloudera Upgrade from CDH 5.3 to CDH 5.4.
Good experience on cluster audit findings and tuning configuration parameters.
Deployed high availability on the Hadoop cluster quorum journal nodes.
Implemented automatic failover zookeeper and zookeeper failover controller.
Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
Good experience with Hadoop Ecosystem components such as Hive, HBase, Sqoop, Oozie.
Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
Help design of scalable Big Data clusters and solutions.
Commissioning and Decommissioning Nodes from time to time.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
Involved in converting HIVE/SQL queries into spark transformations using Spark RDDs and Scala.
Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
Implemented Rack Awareness for data locality optimization.
Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
Implemented Name node backup using NFS.
Work with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
Production support responsibilities include cluster maintenance.

Environment: RHEL, CDH 5.4, HDFS, HUE, Oozie, HIVE, Sqoop, Zookeeper, Spark, kafka, Unix scripts, YARN, Capacity Scheduler, Kerberos, Oracle, MySQL, Ganglia, Nagios.

Senior Hadoop Administrator

Confidential, Palo Alto, CA

Responsibilities:

Deployed a Hadoop cluster using Hortonworks distribution HDP integrated with Nagios and Ganglia.
Monitored workload, job performance and capacity planning using Ambari.
Imported logs from web servers with Flume to ingest the data into HDFS.
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Performed operating system installation, Hadoop version updates using automation tools.
Deployed high availability on the Hadoop cluster quorum journal nodes.
Implemented automatic failover zookeeper and zookeeper failover controller.
Installed, Configured and maintained HBASE.
Designed the authorization of access for the Users using SSSD and integrating with Active Directory.
Integrated all the clusters Kerberos with Company’s Active Directory and created USERGROUPS and PERMISSIONS for authorized access in to the cluster.
Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
Configured Oozie for workflow automation and coordination.
Implemented rack aware topology on the Hadoop cluster.
Implemented Kerberos security in all environments.
Implemented Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
Defined file system layout and data set permissions.
Good experience in troubleshoot production level issues in the cluster and its functionality.
Backed up data on regular basis to a remote cluster using distcp.
Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
Commissioning and Decommissioning of nodes depending upon the amount of data.
Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.

Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, ECLIPSE, HORTONWORKS AMBARI, WINSCP, PUTTY.

Hadoop Administrator/Developer

Confidential, San Antonio, TX

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Managing and scheduling Jobs on a Hadoop cluster.
Deployed Hadoop Cluster in the following modes.
Implemented Name Node backup using NFS. This was done for High availability.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive External tables and loaded the data in to tables and query data using HQL.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Created local YUM repository for creating and updating packages.
Implemented Rack awareness for data locality optimization.
Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Installed Ambari on existing Hadoop cluster.
Designed and allocated HDFS quotas for multiple groups.
Configured and deployed hive Meta store using MySQL and thrift server.
Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
Developed Application components API’s using core Java.
Worked with support teams to resolve performance issues.

Environment: Hadoop 1x, Hive, Pig, HBASE, Ambari, Sqoop and Flume, NFS, MySQL, winscp, putty.

Linux/Hadoop Administrator

Confidential, San Diego, CA

Responsibilities:

Involved in Installation and maintenance Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase.
Pro-actively monitored systems and services, architecture design and implementation of Hadoop Deployment, configuration management, backup and disaster recovery systems and procedures.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Responsible for gathering the requirements doing the analysis and formulating the requirements specifications with the consistent inputs/requirements.
Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Developed servlets and JSP as application controllers.
Design and developed HTML front end screens and validated forms using JavaScript.
Deployed web application and web Logic server.
Used JDBC for database connectivity.
Deployed necessary SQL queries for database transactions.
Involved in testing, implementation and documentation.
Front end was built with JSPs, JavaScript and HTML.
Integrated data from multiple data sources.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Hadoop Map reduce, HBase, Shell Scripting.

We provide IT Staff Augmentation Services!

Senior Hadoop Administrator/developer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship