Sr Bigdata/hadoop Administrator Resume
Roswell, GA
SUMMARY:
- Hadoop Certified Administrator (HDPCA) with 9 years of experience including 4.5 years of experience in Big Data Platform & Big Data Technologies, Background includes extensive Hadoop consulting experience and multi - node Hadoop cluster setup in cloud platform and on-premise including design, implementation, tuning, securing, upgrade, management and troubleshooting.
- Experience in Designing, Installing, configuring, securing, tuning, supporting and managing Hadoop Clusters using Hortonworks and Cloudera distributions.
- Experience in installing wide range of BigData components such as HDFS, Hive, HBase, PIG, OOZIE, Impala, Hue, Zookeeper, Flume, Kafka, Solr, Knox, Storm, Nifi, Ranger, Atlas and Spark etc.
- Hands on experience in implementing HDP, HDF and full suite of security features including Kerberos KDC, Knox, AD/LDAP integration, Ranger and Data Encryption.
- Ability to implement solutions with AWS Virtual Private Cloud(VPC), Elastic Compute Cloud(EC2), AWS Simple Storage Service(S3), Elastic Block Storage (EBS), Elastic File System (EFS), Amazon Relational Database Service(RDS), CloudWatch and IAM Policies/Roles.
- Hands on experience in building hadoop clusters with on-premise (bare metal) implementations as well as cloud offerings (Amazon, Azure ).
- Knowledge in Microservice based solutions like Containers, Dockers and Kubernetes.
- Experience in HDF Stack Schema registry, SAM, Nifi, Kafka Installation and Configuring Security Components.
- Good knowledge in setting up CI/CD pipelines.
- Hands on experience in implementing HDP & HDF Services authorization through Ranger policies.
- Hands on experience in Tuning YARN components to achieve high performance in Bigdata Cluster.
- Screen Hadoop cluster job performances and Capacity Planning.
- Connect Hadoop clusters with pre-existing data integration and visualization tools such as Talend, Tableau & Spotfire.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high Data Quality and availability.
- Collaborating with application teams to install operating system and Hadoop Updates, Patches, Version Upgrades when required.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Experience in configuration and tuning of various components like HDFS, YARN, Hive, Spark, Zeppelin Notebook.
- Hands on experience in administering Linux systems to deploy Hadoop clusters and monitor using Ambari and Cloudera Manager.
- Hands on experience in setting up HDP and HDF clusters using Hortonworks Best Practices.
- Well versed in importing structured data from RDBMSs like MySQL and Oracle into HDFS and Hive using SQOOP.
- Hands on experience in deploying and administering large scale distributed data storages using Hbase.
- Experience in configuring Phoenix Query Server to connect Hbase and execute queries .
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Good shell scripting ability for automation.
- Expertise in performance analysis, troubleshooting, and debugging.
TECHNICAL SKILLS:
Operating Systems: RedHat LINUX Release 5.x / 6.x / 7.x, Cent OS 6.x / 7.x, Microsoft Windows 9 / 2008, Windows XP
Big Data Domains: Hortonworks HDP 1.x/2.x/3.x, Cloudera CDH 4.x/5.x, HDF 3.1.1, HDFS, MapReduce, Yarn, HBase, Hue, Hive, Pig, Flume, Sqoop, Oozie, Zookeeper, Impala, Tez, Spark, Knox, Solr, Kafka, Storm, Nifi
Scheduler: Oozie, Tidal, Autosys
Languages: Java, Python, SQL, HQL
RDBMS: Oracle 11g/10g/9i, MS SQL Server 2000, MySQL, PostgreSQL
NoSQL Databases: HBase
Security: ACL, Kerberos, Ranger
Scripting: Shell Scripting, Python
Monitoring/Management Tools: Cloudera Manager, Hortonworks Ambari, Nagios, ITRS
Cloud Platforms: AWS
Tracking Tools: JIRA, Service Now
Software Development Tool: Eclipse, Net Beans
EXPERIENCE:
Confidential, Roswell, GA
Sr Bigdata/Hadoop Administrator
Responsibilities:
- Propose and deploy new hardware and software environments required for Hadoop services and expand existing environments to resolve data processing problems.
- Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, YARN, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Storm, Ranger, Knox and Spark on Linux servers using Ambari.
- Implemented HDP, HDF and full suite of security features including Kerberos KDC, Knox, AD/LDAP integration, Ranger and Data Encryption.
- Develop and automate processes for maintenance of the environment.
- Designed and Deployed HDP/HDF cluster on AWS environment using AWS Services like AWS Virtual Private Cloud, Elastic Compute Cloud (EC2), AWS Simple Storage Service(S3), Amazon Relational Database Service(RDS), CloudWatch, Elastic Block Storage (EBS), Elastic File System (EFS) and IAM Policies/Roles.
- Designed and implemented Disaster Recovery cluster for Hadoop PROD Cluster.
- Analyse user requirements, and problems to automate or improve existing systems and schedule workflows using Apache Oozie to overcome existing problems .
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services and QJM.
- Ensure proper resource utilization between the different development teams and processes .
- Designed and implemented HDF cluster deployment with Nifi Service and enabled security components.
- Understand IT Security needs of client and ensure that a proper authentication and authorization model is designed, tested, and implemented .
- Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the MapReduce jobs given by the users.
- Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
- Moving the data from Oracle, Teradata, and MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Implement security measures for all aspects of the Hadoop cluster (SSL, disk encryption, role-based access via Apache Ranger policies).
- Setting up Data retention policies and space Quotas based on the space availability.
- Ensure all Hadoop related services maintain availability in a high-demand environment.
- Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
- Monitored cluster stability, used tools to gather statistics and improved performance.
- Developed Shell Scripts for system management and automation.
Environment: RHEL OS 6.6/6.8, CentOS 7, HDP (2.x,3,x), HDF (3.x), Azure HDI, HDFS, MapReduce, Tez, YARN, Pig, Hive, HBase, Sqoop, Oozie, Zookeeper, Ambari, Kafka, Spark, Storm, NIFI, Kerberos, LDAP/AD, SSL, Ranger.
Confidential, Framingham, MA
Sr Hadoop Administrator
Responsibilities:
- Understood the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop ecosystem.
- Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, YARN, Zookeeper, Hbase, Hive, MapReduce, Pig, Kafka, Storm and Spark in Linux servers using Ambari.
- Set up automated 24x7x365 monitoring and escalation infrastructure for Hadoop cluster using Nagios Core and Ambari.
- Designed and implemented Disaster Recovery cluster for Hadoop PROD Cluster.
- Perform HDP upgrade on Non-Prod and Production clusters.
- Setup of Big data tools like arcadia, memsql.
- Integrated Hadoop cluster with Active Directory and enabled Kerberos for Authentication.
- Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the MapReduce jobs given by the users.
- Set up Linux Users, and tested HDFS, Hive, Pig and MapReduce Access for the new users.
- Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
- Optimized Hadoop clusters components: HDFS, Yarn, Hive, and Kafka to achieve high performance.
- Moving the data from Oracle, Teradata, and MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Installed and configured Apache NIFI.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Hands on experience with troubleshooting Spark jobs running in standalone/cluster mode.
- Responsible for design and creation of Hive tables and worked on various performance optimizations like ORC, Compression, Partition and Bucketing in the hive.
- Setting up Data retention policies based on the space availability.
- Hands on experience in Ranger installation and configuring services policies.
- Worked with Linux server admin team in administering the server Hardware and operating system.
- Vendor co-ordination for Hadoop and Big data tools.
- Provided User, Platform and Application support on Hadoop Infrastructure.
- Applied Patches and Bug Fixes on Hadoop Cluster.
- Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
- Conducted Root Cause Analysis and resolved production problems and data issues.
- Performed Disk Space management to the users and groups in the cluster.
- Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
- Performed Backup and Recovery process to Upgrade Hadoop stack.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs such as MapReduce, Pig, Hive, and Sqoop as well as system specific jobs such as Java programs and Shell scripts.
- Design and implemented Kafka cluster with separate nodes for brokers and provide operation support.
- Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
- Monitored cluster stability, used tools to gather statistics and improved performance.
- Identified disk space bottlenecks and Installed Nagios and integrated it with the PRD cluster to aggregate service logs from multiple nodes and created dashboards for important service logs for better analyzation based on historical log data.
Environment: RHEL OS 6.6/6.8, HDP (2.3.2, 2.5.0), HDFS, MapReduce, Tez, YARN, Pig, Hive, HBase, Sqoop, Oozie, Zookeeper, Ambari, Nagios Core, Kafka, Spark, Storm, Kerberos, Ranger, Tidal.
Confidential, Seattle, WA
Hadoop Administrator
Responsibilities:
- Installed and configured Hadoop cluster of HDP Distribution (HDP 2.2.0) using Ambari.
- Integrated Hadoop cluster with Zookeeper cluster and achieved NameNode High Availability.
- Maintained Hadoop ecosystem security by Installing and configuring Ranger.
- Creating new user accounts and assigning pools for the application usage.
- End to End support in DEV and PRD clusters.
- Cluster maintenance using Ambari for adding nodes and decommission dead nodes.
- Balanced Hadoop cluster for storage consistency.
- Tuned YARN components to achieve high performance for MapReduce jobs.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked with application teams to install Hadoop Updates, Patches, version Upgrades as required.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Worked with big data developers, designers in troubleshooting MapReduce job failures and issues with Hive, Pig and Flume.
- Migrated structured data from multiple RDBMS servers to Hadoop platform using Sqoop.
- Familiar with Bucketing and Partitioning for Hive performance improvement.
- Orchestrated Sqoop scripts, pig scripts, hive queries using Oozie workflows and sub-workflows.
Environment: RedHat (OS 6.6), HDP 2.2.0, HDFS, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, MySQL, Ambari, Nagios.
Confidential
Software Engineer
Responsibilities:
- Gathering requirements from the client and implementing the working solutions.
- Help customers in writing SQL queries for their day to day needs.
- Writing shell scripts for automating all manual tasks.
- Develop and maintenance Java code to fix Production issues.
- Analyzing job failures and develop fixes to ensure data delivery to production without delay.
- Closely working with Confidential Order management systems like ARBOL.
- Raising/Working on JIRA tickets rose for modification/changes to any process.
- Tracking order flow between exchanges and hubs and providing customers with needed details.
- Sending test orders, approving, executing, releasing for ticketing through client simulator.
- Resolving L3 level production issues escalated from support teams with in Confidential .
- Generating QC reports of data sets and presenting the customers with required information.
Environment: UNIX Shell scripting, Core JAVA, Mysql, Client Simulator, Autosys, ITRS, Putty, AquaDataStudio
Confidential
Software Engineer
Responsibilities:
- Monitoring ITRS tool to handle Incident & service request to resolve within SLA.
- Resolve all major/critical applications incidents on production within SLA.
- Monitoring of servers to make sure system resources are utilized effectively.
- Participating new releases in Production.
- Checking the log files to ensure the application is available all the time.
Environment: UNIX Shell scripting, Mysql, Autosys, ITRS, Putty