Sr. Hadoop Administrator Resume
PA
SUMMARY:
- Overall 7+ years of experience in Software analysis, design, development and maintenance in diversified areas of Client - Server, Distributed and embedded applications.
- Hands on experiences with Hadoop stack. (HDFS, MapReduce, YARN, Sqoop, Flume, Hive-Beeline,Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, Centrify DC, Falcon, Hue, Kafka, and Storm).
- Experience with Cloudera Hadoop Clusters with CDH 5.6.0 with CM 5.7.0.
- Experienced on Horton works Hadoop Clusters with HDP 2.4 with Ambari 2.2.
- Hands on day-to-day operation of the environment, knowledge and deployment experience in Hadoop ecosystem.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
- Commissioning and de-commissioning the cluster nodes, Data migration. Also, Involved in setting up DR cluster with BDR replication setup and Implemented Wire encryption for Data at REST.
- Implemented Security TLS 3 over on all CDH services along with Cloudera Manager.
- Data Guise Analytics implementation over secured cluster.
- Blue-Talend integration and Green Plum migration has been successfully implemented.
- Ability to plan, manage HDFS storage capacity and disk utilization.
- Assist developers with troubleshooting MapReduce, BI jobs as required.
- Provide granular ACLs for local file datasets as well as HDFS URIs. Role level ACL Maintenance.
- Cluster monitoring and troubleshooting using tools such as Cloudera, Ganglia, NagiOS, and Ambari metrics.
- Manage and review HDFS data backups and restores on Production cluster.
- Implement new Hadoop infrastructure, OS integration and application installation. Install OS (rhel6, rhel5, centos, and Ubuntu) and Hadoop updates, patches, version upgrades as required.
- Implement and maintain security LDAP, Kerberos as designed for cluster.
- Expert in setting up Horton works (HDP2.4) cluster with and without using Ambari2.2
- Experienced in setting up Cloudera (CDH5.6) cluster using packages as well as parcels Cloudera manager 5.7.0.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce concepts.
- Solid understanding of all phases of development using multiple methodologies i.e. Agile with JIRA, Kanban board along with ticketing toolRemedy and Servicenow.
- Expertise to handle tasks in Red HatLinux includes upgrading RPMS using YUM, kernel, configure SAN Disks, Multipath and LVM file system.
- Creating and maintaining user accounts, profiles, security, rights, disk space and process monitoring. Handling and generating tickets via the BMCRemedy ticketing tool.
- Configure UDP, TLS, SSL, HTTPD, HTTPS, FTP, SFTP, SMTP, SSH, Kickstart,Chef, Puppet and PDSH.
- Overall Strong experience in system Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance monitoring and Fine-tuning on Linux (RHEL) systems.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, MapReduce, Cassandra, Pig, Hcatalog, Sqoop, Flume, Zookeeper, Kafka, Mahout, Oozie, CDH, HDP
Tools: Quality center v11.0\ALM, TOAD, JIRA, HP UFT, Selenium,,Kerberos, JUnit
Programming Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Java
QA Methodologies: Waterfall, Agile,(TM) V-model.
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G) Web Services SOAP(JAX-WS), WSDL, SOA, Restful(JAX-RS), JMS
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, JBoss
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB, Cassandra
Operating Systems: Linux, UNIX, MAC, Windows
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Administrator
Confidential, PA
Responsibilities:
- Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Managed 350+ Nodes CDH 5.2 cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Upgraded the Hadoop cluster from CDH4.7 to CDH5.2
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
- Monitored cluster for performance and, networking and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Test strategize, test plan and test case creation in providing test coverage across various products, systems, and platforms.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager.
- Maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Spark, Oozie, Flume, HBase, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5
Hadoop Administrator
Confidential, Atlanta, GA
Responsibilities:
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Involved in loading data from UNIX file system to HDFS.
- Experience in deploying versions of MRv1 and MRv2 (YARN).
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Spark, Kafka, Oozie, Pig, Hive.
- Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the Map Reduce jobs given by the users.
- Installation of various Hadoop Ecosystems like Hive, Pig, Hbase etc.
- Installed and configured Spark on multi node environment.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning.
- Expertise in recommending hardware configuration for Hadoop cluster.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
- Managing and reviewing Hadoop and HBase log files.
- Experience with UNIX or LINUX, including shell scripting.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Built automated set up for cluster monitoring and issue escalation process.
- Administration, installing, upgrading and managing distributions of Hadoop (CDH3, CDH4, Cloudera manager), Hive, Hbase and Hortonworks.
Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, Red hat Linux, Cloudera Manager, Horton works.
Hadoop Administrator
Confidential, Yonkers, NY
Responsibilities:
- Build/deploy/configure/maintain multiple Hadoop clusters in production, staging, and development environments that process over 2 TB events per day.
- Build/deploy/configure/maintain multiple real-time clusters consisting of Apache products: Flume, Storm, Mesos, Spark, and Kafka.
- Created scripts to form EC2 clusters for training and for processing.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like frequency of calls, top calling customers.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Created Cassandra Advanced Data Modeling course for DataStax.
- Secured production environments by setting up and deploying Kerberos in the Hadoop cluster.
- Support application administrators, developers, and users. Release and support real-time products from development through production.
- Participate in various Proof of Concept projects to evaluate new technologies for data collection and analysis.
- Identify necessary alerts and remediation process for new components.
- Imported data from MySQL server to HDFS using Sqoop.
- Manage the day-to- day operations of the cluster for backup and support.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific Jobs.
- Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.
Environment: Ambari, Hbase, Hive, Pig, Sqoop, Apache Ranger, Splunk, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper, RegEx, JSON.
Hadoop Administrator
Confidential, Houston, TX
Responsibilities:
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Experience in the Azure components & APIs.
- Thorough knowledge on Azure platforms IAAS, PaaS
- Manage Azure based SaaS environment.
- Azure Data Lakes and Data Factory.
- Worked on configuring Hadoop cluster on AWS.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Created user accounts and given users the access to the Hadoop cluster.
- Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
- Experience on Oracle OBIEE.
- Responsible for HBase REST server administration, backup and recovery.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they could read the data and write to HDFS without any issues.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambary.
- Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Supported Data Analysts in running MapReduce Programs.
- Responsible for deploying patches and remediating vulnerabilities.
- Provided highly available and durable data using AWS S3 data store.
- Experience in setting up Test, QA, and Prod environment.
- Involved in loading data from UNIX file system to HDFS.
- Created root cause analysis (RCA) efforts for the high severity incidents.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
- Documenting the procedures performed for the project development.
- Assigning tasks to offshore team and coordinate with them in successful completion of deliverables.
Environment: RedHat/Suse Linux, EM Cloud Control, Cloudera 4.3.2, HDFS, Hive, Sqoop, Zookeeper and HBase, HDFS Map Reduce, Pig, NO SQL, Oracle 9i/10g/11g RAC with Solaris/Red hat, Exadata Machines X2/X3, HDP, Toad, MYSQL plus, Oracle Enterprise Manager (OEM), RMAN, Shell Scripting, Golden Gate, Azure platform, HDInsight.
Hadoop Administrator
Confidential, Houston, TX
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper, Cassandra and Sqoop.
- Implemented High Availability Name Nodes using Quorum Journal Managers and Zookeeper Failover Controllers.
- Managed 350+ Nodes HDP 2.3 cluster with 4 peta bytes of data using Ambari 2.0 and Linux Cent OS 7.
- Familiar with Hadoop Security involving LDAP, Kerberos, Ranger.
- Strong experience using Ambary administering large Hadoop clusters > 100
- After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and Kafka.
- Configure LDAP User Management Access
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System)
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Set up Kerberos locally on 5 node POC cluster using Ambari and evaluated the performance of cluster, did impact analysis of Kerberos enablement.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Tuned complex SQL queries and debugged Talend job code for performance enhancement.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using sqoop.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Data analysis in running Hive queries.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
Environment: Hive, Pig, HBase, Zookeeper, Cassandra, Horton works HDP 2.3, Python, Scala, Kafka, Spark, Talend, shell scripts, Flume, Sqoop, Oracle, Talend, BODS and HDFS.
Hadoop Administrator
Confidential, El Segundo, CA
Responsibilities:
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
- Handle the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitor the data streaming between web sources and HDFS.
- Monitor the Hadoop cluster functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Handle the upgrades and Patch updates.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Responsible for building scalable distributed data solutions using Hadoop.
- Commission or decommission the data nodes from cluster in case of problems.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
- Set up and manage HA Name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters.
- Set up the checkpoints to gathering the system statistics for critical set ups.
- Discussions with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera
Linux Administrator
Confidential
Responsibilities:
- Installed RedHat Linux using kickstart.
- Created, cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Managed systems routine backup, scheduling jobs, enabling cron jobs, enabling system logging and network logging of servers for maintenance.
- Performed RPM and YUM package installations, patch and another server management.
- Installed and configured Logical Volume Manager - LVM and RAID.
- Documented all setup procedures and System Related Policies (SOP's).
- Provided 24/7 technical support to Production and development environments.
- Administrated DHCP, DNS, and NFS services in Linux.
- Created and maintained user's accounts, profiles, security, rights disk space and process monitoring.
- Provided technical support by troubleshooting Day-to-Day issues with various Servers on different platforms.
- Diagnose, solve and provide root cause analysis for hardware and OS issues
- Run prtdiag -v to make sure all memory and boards are online, check for failure
- Supported Linux and Sun Solaris Veritas clusters.
- Notify server owner if there was a failover or crash. Also notify Unix Linux Server Support L3
- Check for core files, if exist send to Unix Linux Server Support for core file analysis.
- Monitor CPU loads, restart processes, and check for file systems.
- Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
- Helped and installed system using kickstart
- Installation & maintenance of Windows 2000 & XP Professional, DNS and DHCP and WINS for the Bear Stearns DOMAIN.
- Use LDAP to authenticate users in Apache and other user applications
- Remote Administration using terminal service, VNC and PCA anywhere.
- Create/remove windows accounts using Active Directory
- Reset user password with Windows Server 2003 using Ds mod command-line tool
- Provided end-user technical support for applications
- Maintain/Create and update documentation
Environment: DNS, TCP/IP, DHCP, LDAP, Linux, Unix, Shell