Senior Hadoop Administrator/developer Resume
New York City, NY
SUMMARY:
- 5+ years of experience in IT, which includes 4 years of experience in Hadoop environment as an administrator.
- Hands on experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, SPARK, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
- Installation and configuration of HADOOP cluster and maintenance, cluster monitoring. Troubleshooting.
- Assisted in designing, developing and architecture of HADOOP ecosystem.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on VMWare, Amazon Web Services (AWS) using an EC2 instance and IBM Bluemix.
- Worked at optimizing volumes and AWS EC2 instances and created multiple VPC instances.
- Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- In depth knowledge and good understanding of Hadoop daemons: NameNode, DataNode, Secondary NameNode, Resource Manager, NodeManager.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Knowledge of multiple distributions/platforms (Apache, Cloudera, Hortonworks).
- Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem. Also, handled importing of various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
- Knowledge in Programming MapReduce, HIVE and PIG.
- Knowledge in NoSQL databases like HBase and Cassandra.
- Exposure in setting up data importing and exporting tools such as Sqoop from RDBMS to HDFS.
- Advanced knowledge in Fair and Capacity schedulers and configuring schedulers in cluster.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Supported technical team for automation, installation and configuration tasks.
- Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
- Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux.
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
- Experience in providing security for Hadoop Cluster with Kerberos.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Experience in HDFS data storage and support for running MapReduce jobs.
- Working knowledge on Oozie, a workflow scheduler to manage the jobs that run on PIG, HIVE and sqoop.
- Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving and leadership skills.
- Research oriented, motivated, proactive. Self-starter with strong technical, analytical skills.
TECHNICAL SKILLS:
Hadoop Tools: IBM BigInsights (4.2), Cloudera Manager (CDH5), Hortonworks HDP 2.x, Ambari, HDFS, Oozie, Hive, Pig, Sqoop, Flume, spark, Kafka.
Other Utility Tools: winscp, putty
Operating System: Windows, UNIX, RHEL, CentOS
Databases: Oracle 11g, MySQL, MS SQL Server, NoSQL Databases HBase
Scripting: Linux Shell Scripting, HTML Scripting
Programming: C, Java, MapReduce, SQL, HIVE, PIG.
PROFESSIONAL EXPERIENCE:
Senior Hadoop Administrator/Developer
Confidential, New York City, NY
Responsibilities:
- Installing and configuring the IBM BigInsights Hadoop cluster for the Development and Qa environment.
- Preparing the shell-scripts to create the Local and HDFS file-system.
- Creating the Encryption zones to enable Transparent Data Encryption (TDE) in production Environment.
- Creating users and groups using user management tool (FreeIPA).
- Enabling the Kerberos in the cluster and generating the keytabs for the process users and adding it to the user’s bash.
- Setting the umask for users in both Local RHEL and also in HDFS.
- Creating Hive Databases/Schemas and tables and storage based authentication for Hive and Impersonation.
- Hive Trustore set up for beeline connectivity.
- Synchronizing the hive tables with BigSql using HCatalog and querying the tables using Data Server Manager (DSM).
- Setting proper ACL’s on the both Local and HDFS file-system to prevent access for unwanted users and groups.
- Creating Capacity-Scheduler YARN queues and sharing the percent of resources between each queue.
- Validating the final production cluster setup in IBM Bluemix cloud environment.
- Automating the data fetching accelerators from multiple data sources/servers using oozie workflows.
- Involved in Built and Deployment of applications in Production cluster.
- Writing oozie workflows for the jobs, and scheduling the jobs with defined frequency and automating the entire process.
- Checking logs to figure out the issues of failed jobs and clearing logs.
- Monitoring the jobs in production and troubleshooting the failed jobs and configuring e-mail notification for the failed jobs using SendGrid.
- Performance tuning of the cluster.
- Migration of setup from IBM BigInsights 4.1 cluster to 4.2 cluster and replicating the cluster settings.
- Listing the pre-defined alerts in the cluster and setting the e-mail notification for the alerts, which are in high priority.
Environment: RHEL, IBM BigInsights (4.2), MapReduce, HIVE, PIG, Oozie, HCatalog, BigSQL, DataServerManager, Kerberos, KNOX, SQOOP.
Senior Hadoop Administrator
Confidential, Bethpage, NY
Responsibilities:
- Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
- Hands on experience on Cloudera Upgrade from CDH 5.3 to CDH 5.4.
- Good experience on cluster audit findings and tuning configuration parameters.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, Sqoop, Oozie.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Help design of scalable Big Data clusters and solutions.
- Commissioning and Decommissioning Nodes from time to time.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Involved in converting HIVE/SQL queries into spark transformations using Spark RDDs and Scala.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Implemented Rack Awareness for data locality optimization.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Implemented Name node backup using NFS.
- Work with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
- Production support responsibilities include cluster maintenance.
Environment: RHEL, CDH 5.4, HDFS, HUE, Oozie, HIVE, Sqoop, Zookeeper, Spark, kafka, Unix scripts, YARN, Capacity Scheduler, Kerberos, Oracle, MySQL, Ganglia, Nagios.
Senior Hadoop Administrator
Confidential, Palo Alto, CA
Responsibilities:
- Deployed a Hadoop cluster using Hortonworks distribution HDP integrated with Nagios and Ganglia.
- Monitored workload, job performance and capacity planning using Ambari.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Installed, Configured and maintained HBASE.
- Designed the authorization of access for the Users using SSSD and integrating with Active Directory.
- Integrated all the clusters Kerberos with Company’s Active Directory and created USERGROUPS and PERMISSIONS for authorized access in to the cluster.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Implemented Kerberos security in all environments.
- Implemented Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Defined file system layout and data set permissions.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
- Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, ECLIPSE, HORTONWORKS AMBARI, WINSCP, PUTTY.
Hadoop Administrator/Developer
Confidential, San Antonio, TX
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Managing and scheduling Jobs on a Hadoop cluster.
- Deployed Hadoop Cluster in the following modes.
- Implemented Name Node backup using NFS. This was done for High availability.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Created local YUM repository for creating and updating packages.
- Implemented Rack awareness for data locality optimization.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Installed Ambari on existing Hadoop cluster.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive Meta store using MySQL and thrift server.
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Developed Application components API’s using core Java.
- Worked with support teams to resolve performance issues.
Environment: Hadoop 1x, Hive, Pig, HBASE, Ambari, Sqoop and Flume, NFS, MySQL, winscp, putty.
Linux/Hadoop Administrator
Confidential, San Diego, CA
Responsibilities:
- Involved in Installation and maintenance Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase.
- Pro-actively monitored systems and services, architecture design and implementation of Hadoop Deployment, configuration management, backup and disaster recovery systems and procedures.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Responsible for gathering the requirements doing the analysis and formulating the requirements specifications with the consistent inputs/requirements.
- Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Developed servlets and JSP as application controllers.
- Design and developed HTML front end screens and validated forms using JavaScript.
- Deployed web application and web Logic server.
- Used JDBC for database connectivity.
- Deployed necessary SQL queries for database transactions.
- Involved in testing, implementation and documentation.
- Front end was built with JSPs, JavaScript and HTML.
- Integrated data from multiple data sources.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, Linux, Hadoop Map reduce, HBase, Shell Scripting.