Hadoop Administrator Resume
Charlotte, NC
PROFESSIONAL SUMMARY
- Around 7 years of Professional experience in IT background which includes 4 years in Hadoop Technologies and extensive experience in Linux flavors.
- Experience in operating and managing large clusters with 250+ nodes and 2+ Petabytes of storage.
- Hands on experience in setting up, configuring Hadoop ecosystem components like Hadoop, MapReduce, HDFS, Yarn, HBase, IMPALA, OOZIE, HIVE, SQOOP, PIG, SPARK, FLUME, KAFKA, SENTRY services.
- Experience in planning, implementing, testing and documenting teh performance benchmarking to Hadoop platform.
- Experience in both On - Premises and Cloud space: AWS, GCP, and AZURE.
- Experience with securing Hadoop clusters by implementing Kerberos KDC installation, LDAP Integration, data transport encryption with TLS, and data-at-rest encryption with Cloudera Navigator Encrypt.
- Experience on Design, configure and manage teh backup and disaster recovery using Data Replication, Snapshots, and Cloudera BDR utilities.
- Implemented Role based authorization for HDFS, HIVE, and IMPALA using Apache Sentry.
- Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager, Ganglia and Nagios.
- Experienced in implementing and supporting auditing tools like Cloudera Navigator.
- Knowledge on implementing external autantication with Identity providers like Okta, IDP using SAML.
- Experience in implementing High Availability features for services like NameNode, HUE, and IMPALA.
- Hands-on experience in Deploying and using automation tools like PUPPET for cluster configuration management.
- Experience in creating cookbooks/playbooks and documentations for Installation, upgrades and support projects.
- Experience in documenting standard practices and compliance policies.
- Participated and lead teh upgrades of CDH4 and CDH5.
- Fix teh issues by interacting with dependent and support teams and log teh cases based on teh priorities.
- Assisted in tuning teh performance of teh Hadoop ecosystem as well as monitoring.
- Hands on experience in performing functional testing and helps application teams/users to incorporate third party tools with Hadoop environment.
- Experience in analyzing Log files and finding teh root cause and tan involved in analyzing failures, identifying root causes and taking/recommending course of actions.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing job counters and application logs files.
- Experienced in tuning performance for various services
- Experienced in job scheduling and monitoring tools like Control M, Nagios and Ganglia.
- Additional responsibilities include interacting with offshore team on a daily basis, communicating teh requirement and delegating teh tasks to offshore/on-site team members and reviewing their delivery.
- Good Experience in managing Linux platform servers
- TEMPEffective problem solving skills and ability to learn and use new technologies/tools quickly
- Good scripting knowledge in Bash shell scripting.
- TEMPHas good experience, excellent communication and interpersonal skills which contribute to timely completion of project deliverables well ahead of schedule.
- Experience in providing on-call and weekend production support
TECHNICAL SKILLS:
Big Data Technologies: Hortonworks, HDFS, Yarn, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Apache NiFi, Scoop, Zookeeper, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm, Cloudera.
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, WebLogic Server, Web Sphere
Security: Kerberos, Knox.
Reporting Tools: Cognos, Hyperion Analyzer, OBIEE & BI+
Analytic Tool: Elastic search-Logstash-Kibana
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential, Charlotte, NC
Responsibilities:
- Created a new project solution based on teh company's technology direction ensured that infrastructure services are projected based on current standard.
- Installation, Configuration and Management of Greenplum Pivotal Hadoop cluster.
- Upgrading teh cluster from CDH 4.x to CDH 5.x.
- Implemented HA for NameNode and HUE using Cloudera manager
- Created and configured cluster monitoring service activity monitor, service monitor, report manager, event server and alert publisher.
- Created cookbooks/playbooks and documentations for special tasks
- Configured HA proxy for IMPALA service
- Writing desktop scripts for synchronizing teh data within and across clusters.
- Created snapshot's for in cluster backup of teh data instance.
- Created SQOOP scripts for ingesting data from Transactional systems to Hadoop.
- Integrated Tableau, Teradata, DB2, ORACLE via ODBC/JDBC drivers with Hadoop
- Worked with application teams to install teh operating system, Hadoop updates, patches, version upgrades as required.
- Created scripts for automating balancing data across teh cluster using teh HDFS load balancer utility.
- Created POC for implementing streaming use case with Kafka and HBase services.
- Working experience of maintaining MySQL database creation and setting up teh users and maintain teh backup of databases.
- Implemented Kerberos Security Autantication protocol for existing cluster.
- Integrated is an existing LLE and Production cluster with LDAP.
- Implemented TLS for CDH Services and for Cloudera Manager.
- Working with data delivery teams to set up new Hadoop users. dis job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Managed teh backup and disaster recovery for Hadoop data. Coordinated root cause analysis efforts to minimize future system issues
- Served as lead technical infrastructure Architect and Big Data subject matter expert.
- Deployed Big Data solutions in teh cloud. Built, configured, monitored and managed end to end Big Data applications on Amazon Web Services (AWS)
- Screen Hadoop cluster job performances and capacity planning
- Spinning clusters in Azure using Cloudera director. Implemented dis for POC for teh cloud migration project.
- Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts
- Created bash scripts frequently, depending on teh project requirements
- Guided application teams on choosing teh right file formats in Hadoop file systems Text, Avro, Parquet and compression techniques such as Snappy, bz2,LZO
- Improved communication between teams in teh matrix environment which led to increase in number of simultaneous projects and average billable hours
- Substantially improved all areas of teh software development life cycle for teh company products, introducing frameworks, methodologies, reusable components and best practices to teh team
- Implemented VPC, Auto scaling, S3, EBS, ELB, Cloud Formation templates and Cloud Watch services from AWS
Environment: Over 200 nodes, Approximately 2 PB of data, Cloudera’s distribution Hadoop (CDH) 5.5, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Kafka, Storm, Cobbler, Puppet.
Hadoop Administrator
Confidential, Irvine, CA
Responsibilities:
- Installation, configuration and Administration of CDH5.x Hadoop cluster
- Run teh benchmark tools to test teh cluster performance
- Configure teh Hadoop properties based on teh benchmark result
- Tuning teh cluster based on teh benchmark results
- Monitoring System Metrics and logs for any problems
- Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
- Installation and configuration of Hadoop ecosystem components like HBase, Hive, Pig, Sqoop, Spark, Zookeeper etc. as per requirement.
- Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
- Work on CDH environments and took ownership of problem isolation and resolution, and whenever case arises do teh bug reporting.
- Ensure that critical user issues are addressed quickly and TEMPeffectively.
- Apply troubleshooting techniques to provide solutions to our user's individual needs.
- Troubleshoot, diagnose and potentially escalate user inquiries during their engineering and operations efforts.
- Investigate product related issues both for individual customers and for common trends that may arise.
- Setting up new Hadoop users with HDFS maintenance and support. Keeping a track of Hadoop Cluster connectivity and security.
- Worked on setting up Apache NiFi and used NiFi in orchestrating data pipeline
Environment: Cloudera Manager, HDFS, Yarn, Spark, Kafka, Hive, Pig, Sqoop, Kerberos, Sentry, NiFi, Hortonworks HDP, Oracle, Netezza, Tableau, Python, Java 8.0, Log4J, GIT, AWS, S3, EC2, JIRA.
Linux Hadoop Administrator
Confidential
Responsibilities:
- Worked on Capacity Planning, Configuration, Operationalizing, Management small to medium sized BIGDATA Hadoop Clusters
- Built 80 node & 20 node HDP Hadoop clusters using Ambari
- Supported in administering 150 nodes HDP Hadoop cluster using Ambari
- Defined Hardware and Software prerequisites for small to medium Hadoop clusters
- Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
- Performed Performance tuning of Hadoop cluster components: HDFS, MapReduce2, YARN, Hive, HBase,
- Performed Cluster Upgrade (HDP 2.2.4.2, to HDP 2.3, Ambari 2.0 to Ambari 2.2.1.x)
- Performed backup and monitor of Postgres: Ambari and MySQL: Hive, Oozie, Ranger databases
- Performed upgrading teh clusters using Rolling Upgrade and Express Upgrade
- Enabled services (Kafka, Spark) to suit teh real-time data processing requirements for Ad works data sets
- Monitor Hadoop cluster job performance and capacity planning
- Configure Hadoop security aspects including Kerberos setup and RBAC authorization using Ranger
- Installed and configured Knox Gateway to simplify access and enhance security of Hadoop Cluster.
- Worked with teh business team in understanding and implementing various BIGDATA Hadoop use cases and POC's
- Performed diagnosing, troubleshooting and fixing teh Hadoop infrastructure issues
- Configured Rack Awareness on HDP clusters
- Extensively worked in using Distcp to data transfer Terabytes of data between teh secured and unsecured clusters
Environment: HDP2.3,2.2.4.2/Ambari 2.1.1.x, 2.0, Bash, Ansible, Python, Centos 6.x, Nagios, Graphite, HDFS, Yarn, Kafka, Spark, Hive, Pig, Sqoop, Oozie, Hortonworks HDP, Knox, Kerberos, Ranger, Cassandra, Java 1.7, Log4J, JUnit, MRUnit, Git, JIRA
Linux Administrator
Confidential
Responsibilities:
- Responsible for all aspects of Solaris (8, 10) and Red Hat Linux System Administration (5-7) and support for all Fujitsu Prime-powers, Oracle/Sun T&M series, HP-DLs, Sun Fire, Sun Series (v240, v490, v880) servers.
- Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Monitored System activity, Performance and Resource utilization.
- Maintained Raid-Groups and LUN Assignments as per agreed design documents.
- Performed all System administration tasks like cron jobs, installing packages and patches.
- Used LVM extensively and created Volume Groups and Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Configured Linux guests in a VMware ESX environment.
- Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS, LDAP.
- Performed scheduled backup and necessary restoration.
- Configured Domain Name System (DNS) for hostname to IP resolution.
- Troubleshot and fixed teh issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hour.
Environment: Solaris 8,9,10, Red Hat Linux AS/EL 4.0, AIX 5.2, 5.3, Sun E10k, E25K, E4500, SonicMQ 7.0, VxFS 4.1, VxVM 4.1, SVM, DB2 Connect, Perl, Shell, Veritas, Open Office, C, MySQL, Oracle, DB2.
