Sr. Hadoop Administrator Resume
Cupertino, CA
SUMMARY:
- 10 years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DevOps and Software testing.
- Hands on experiences with Hadoop stack. (HDFS, Map Reduce, YARN, Sqoop, Flume, Hive - Beeline, Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, Centrify DC, Falcon, Hue, Kafka, Storm).
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration.
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
- Provide granular ACLs for local file datasets as well as HDFS URIs. Role level ACL Maintenance.
- Cluster monitoring and troubleshooting using tools such as Cloudera, Ganglia, NagiOS, and Ambari metrics.
- Experience with Cloudera Hadoop Clusters with CDH 5.6.0 with CM 5.7.0.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Implemented Security TLS 3 over on all CDH services along with Cloudera Manager.
- Configure UDP, TLS, SSL, HTTPD, HTTPS, FTP, SFTP, SMTP, SSH, Kickstart, Chef, Puppet and PDSH.
- Overall Strong experience in system Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance monitoring and Fine-tuning on Linux (RHEL) systems.
- Experienced on Horton works Hadoop Clusters with HDP 2.4 with Ambari 2.2.
- Assist developers with troubleshooting Map Reduce, BI jobs as required.
- Expertise to handle tasks in Red Hat Linux includes upgrading RPMS using YUM, kernel, configure SAN Disks, Multipath and LVM file system.
- Built various automation plans from operations stand point.
- Possessing skills in Apache Hadoop, Map-Reduce, Pig, Impala, Hive, HBase, Zookeeper, Sqoop, Flume, OOZIE, and Kafka, storm, Spark, Java Script, and J2EE.
- Good experience in creating various database objects like tables, stored procedures, functions, and triggers using SQL, PL/SQL and DB2.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in Name Node failure scenarios.
- Well versed with installation, configuration, managing and supporting Hadoop cluster using various distributions like Apache Hadoop, Cloudera-CDH and Hortonworks HDP.
- Expert in Linux Performance monitoring, kernel tuning, Load balancing, health checks and maintaining compliance with specifications.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Good knowledge in Fair and Capacity schedulers and configuring schedulers in cluster.
- Implemented DB2/LUW replication, federation, and partitioning (DPF).
- Areas of expertise and include: Database Installation/Upgrade, Backup/Recovery.
- Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem.
TECHNICAL SKILLS:
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, HBase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Talend, Informatica, Tableau
Databases: Confidential DB2, SQL Server, MySQL, Teradata
Tools: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight
Sun Solaris, HP: UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10
Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic
PROFESSIONAL EXPERIENCE:
Sr. HADOOP ADMINISTRATOR
Confidential, Cupertino, CA
Responsibility:
- Worked extensively EDP Platform Administration, support and maintenance of TCF Bank involving various services of Bigdata such as Hadoop, HDFS, MapReduce, Yarn, Hive, Spark, Kafka, Sqoop, HBase etc.
- Automation & job scheduling using Oozie & UNIX Shell Scripts.
- Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, YARN, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Storm and Spark in Linux servers using Ambari.
- Real-time/Near-Realtime data streaming patterns using Kafka & Spark-Streaming.
- Performed integrations of EDP Platform with tools used in TCF Bank such as Informatica, SAS, RapidMiner, Tableau etc.
- Upgraded EDP platform from on-premise to AWS cloud platform.
- Integrated Customer Master MDM with EDP Platform.
- Worked on importing and exporting data from Confidential and DB2 into HDFS and HIVE using Sqoop.
- Expertise with NoSQL databases like HBase, Cassandra, DynamoDB (AWS) and MongoDB.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Hands on experiences with Hadoop stack. (HDFS, Map Reduce, YARN, Sqoop, Flume, Hive-Beeline, Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, Centrify DC, Falcon, Hue, Kafka, Storm).
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
- Excellent experience working on the Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau.
- Manage and review Hadoop log files.
- Set up automated 24x7x365 monitoring and escalation infrastructure for Hadoop cluster using Nagios Core and Ambari.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs such as MapReduce, Pig, Hive, and Sqoop as well as system specific jobs such as Java programs and Shell scripts.
- Worked on configuring F5 LB for Ranger, NiFi and Oozie.
- Configured and Automated SSL/TLS for Ambar, Hive, HBase, Ranger, Ambari-metrics, Oozie, KNOX, Spark, NiFi.
Environment: Hue, Oozie, Eclipse, HBase, Flume, Splunkd, Linux, Java Hibernate, Java jdk, Kickstart, Puppet PDSH, chef, gcc4.2, git, Cassandra, AWS, NoSQL, RedHat, CDH(4.x), Flume, Impala, MySQL, MongoDB, Nagios, Zookeeper, Chef.
Hadoop Administrator
Confidential, SanFransisco, CA
Responsibilities:
- Managed mission-critical Hadoop cluster Confidential production scale, especially Cloudera distribution.
- Involved in capacity planning, with to the growing data size and the existing cluster size.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, and NoSQL databases, Flume, Oozie and Sqoop.
- Monitoring System Metrics and logs for any problems
- Install and configure different Hadoop ecosystem components such as Spark, HBase, Hive, Pig etc. as per requirement.
- Experience in tuning the heap size to avoid any disk spills and to avoid OOM issues.
- Familiar with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
- Good knowledge in NoSQL databases, like HBase, MongoDB, etc.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
- Hands-On experience in setting up ACL (Access Control Lists) to secure access to the HDFS file system.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
- Understood the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop ecosystem.
- Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Storm and Spark in Linux servers using Ambari.
- Expertise in Using Sqoop to connect to the Confidential, MySQL, SQL Server, TD and move the pivoted data to Hive tables or HBase tables.
- Used Sqoop, Distcp utilities for data copying and for data migration.
- End to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
- Identified disk space bottle necks and Installed Nagios Log Server and integrated it with the PRD cluster to aggregate service logs from multiple nodes and created dashboards for important service logs for better analyzation based on historical log data.
Environment: CDH 5.8.3, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper.
Hadoop Administrator
Confidential, Redwood City, CA
Responsibility:
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
- Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
- Problem determination, Security, Shell Scripting.
- Performance Monitoring and Performance Tuning using Top, prstat, sar, vmstat, netstat, jps, iostat etc.
- Experience on backup and recovery software like Net-backup on Linux environment.
- Installed, Configured, Maintained Apache Hadoop Clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java.
- Upgraded the Hadoop cluster from cdh3 to cdh4.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Implemented nine nodes CDH4 Hadoop cluster on Ubuntu LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Deploy UDF's developed by developers to implement business logic in Hadoop.
- Worked on Hive/HBase vs RDBMS, imported data to hive, created internal and external tables, partitions, indexes, views, queries and reports for BI data analysis.
- Configure and maintained FTP, DNS, NFS and DHCP servers. Configuring, maintaining and troubleshooting of local development servers.
- Monitor services and network behavior setup using NagiOS.
Environment: HDFS, Map-Reduce, MAPR, Hive, Sqoop, PIG, Tableau, Cloudera Manager, Flume, SQL Server, RedHat and CentOS.
Hadoop Administrator
Confidential, Buffalo, NY
Responsibilities:
- Worked as Administrator for Monsanto's Hadoop Cluster (180 nodes).
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Imported and exported data into HDFS and Hive using Sqoop.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- Written Flume configuration files to store streaming data in HDFS.
- As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
Environment: CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Zookeeper-3.4.5, Hue-2.5.0, Jira, Web Logic 8.1 Kafka, Yarn, Impala, Chef, Rhel, Pig, Scripting, MySQL, Red Hat Linux, CentOS and other UNIX utilities.
Linux Administrator
Confidential, NYC, NY
Responsibilities:
- Worked on Linux Kick-start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, and LDAP integration.
- Network traffic control, IPSec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
- Creating and managing Logical volumes. Using Java JDBC to load data into MySQL.
- Maintaining the MySQL server and Authentication to required users for databases
- Installing and updating packages using YUM.
- Patches installation and updating on server.
- Virtualization on RHEL server (Through Xen & KVM Server)
- Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
- Successfully Migrated virtual machines from legacy Virtual environment VMware Vsphere 4.1 to new VMware Vsphere 5.1.
- Configuration, implementation and administration of Clustered servers on SUSE Linux environment.
- Configure and maintained FTP, DNS, NFS and DHCP servers.
- Performed configuration of standard Linux and network protocols, such as SMTP, DHCP, DNS, LDAP, NFS, SMTP, HTTP, SNMP and others.
- Tuning the Kernel parameters for the better performance of applications like Confidential .
- Implemented multi-tier application provisioning in OpenStack cloud, integrating it with Puppet.
Environment: LINUX, FTP, Shell, UNIX, VMware, NFS, TCP/IP, Puppet, Confidential, Red Hat Linux.
Linux Administrator
Confidential, Louisville, KY
Responsibilities:
- Analysis of business requirement and design for the modules allocated.
- Involved in Coding, Unit testing, Code review and solving real time issues.
- Interacted with the clients to gather the enhancement details.
- Prepared unit test plans, Release Notes.
- Developed stored procedures and various databases related activities like taking backup, restoring the in SQL Server 2005.
- Developed code for consuming web service.
- Consumed Web Services (SOAP, WSDL) for communicating with other application and components.
- Used Microsoft ADO.NET to access data from database in a web application.
- Used SQL server 2005 for writing Stored Procedures, Views and Triggers.
- Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
- Troubleshooting the backup issues by analyzing the NetBackup logs.
- Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
- Monitoring Disk, CPU and Memory & Performance of servers.
- Managing File systems and troubleshooting file systems issue.
Environment: Red Hat Linux, Solaris, Windows 2003, EMC SAN, Weblogic, Windows NT/2000, Apache, Web Sphere, and JBOSS, System authentication, NFS, DNS, SAMBA.