Sr. Hadoop Administrator Resume
Irving, TX
SUMMARY:
- Around 15+ years of experience in design, development and implementations of robust technology systems, with specialized expertise in Hadoop Administration and Linux Administration.
- 4 years of experience in Hadoop Administration & Big Data Technologies and 3 years of experience into Linux administration.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- I have good experience with design, management, configuration and troubleshooting of distributed production environments based on Apache Hadoop/ HBase etc
- Experience in Hadoop Ecosystem including HDFS, Hive, Pig, Hbase, Sqoop and knowledge of Map - Reduce framework.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Good Experience in Hadoop cluster capacity planning and designing Name Node, Secondary Name Node, Data Node, Job Tracker, Task Tracker.
- Hands on experience in installation, configuration, management and development of big data solutions using Apache, CLOUDERA (CDH3, CDH4) and Hortonworks distributions.
- Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- In-depth knowledge of modifications required in static IP (interfaces), hosts, setting up password-less SSH and Hadoop configuration for Cluster setup and maintenance.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In which my responsibilities are collecting information from, and configuring, network devices, such as servers, printers, hubs, switches, and routers on an Internet Protocol (IP) network.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
- Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
- Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
- Good Experience in setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, swappiness, Selinux and installing Java.
- Good Experience in Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
- Installing and configuring Hadoop eco system like pig, hive.
- Hands on experience in Installing, Configuring and managing the Hue and Hcatalog.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in NameNode failure scenarios.
- Hands on experience in Linux admin activities on RHEL & Cent OS.
- Experience in deploying Hadoop 2.0(YARN).
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.
KEY COMPETENCE:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.5
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH.
Programming Languages: C, Java, SQL, and PL/SQL.
Front End Technologies: HTML, XHTML, XML.
Application Servers: Apache Tomcat, WebLogic Server, Web sphere
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.
NoSQL Databases: HBase, Cassandra, MongoDB
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.
Security: Kerberos, Ranger.
WORK- EXPERIENCE:
Sr. Hadoop Administrator
Confidential - IRVING, TX
Responsibilities:
- Worked on setting up Hadoop cluster for the Production Environment.
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Installed, configured and deployed a 50 node MapR Hadoop Cluster for Development and Production
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Configured, installed, monitored MapR Hadoop on 10 AWS ec2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Used Informatica Power Center to create mappings, mapplets, User defined functions, workflows, worklets, sessions and tasks.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
- Used Informatica Data Explorer (IDE) to find hidden data problems.
- Utilized Informatica Data Explorer (IDE) to analyze legacy data for data profiling.
- Development of Informatica mappings and workflows using Informatica 7.1.1.
- Worked on Identifying and eliminating duplicates in datasets thorough IDQ 8.6.1 components.
- Optimized the full text search function by connecting MongoDB and Elastic Search.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Developed a framework for the automation testing on the Elastic Search index Validation. Java, MySQL.
- Created User defined types to store specialized data structures in Cloudera.
- Wrote a technical paper and created slideshow outlining the project and showing how Cloudera can be potentially used to improve performance.
- Setting up monitoring tools for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper.
- Write scripts to automate application deployments and configurations. Hadoop cluster performance tuning and monitoring. Troubleshoot and resolve Hadoop cluster related system problems.
- As a admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Screen Hadoop cluster job performances and capacity planning.
- Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
- Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Cloudera, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.
Sr. Hadoop Administrator
Confidential - Oak Brook, IL
Responsibilities:
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters.
- Worked as admin on Cloudera (CDH 5.5.2) distribution for clusters ranges from POC to PROD.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Level 2, 3 SME for current Big Data Clusters at the Client Site and set up standard troubleshooting technique.
- Implemented and Configured High Availability Hadoop Cluster (Quorum Based) for HDFS, IMPALA and SOLR.
- Extensively worked on Impala to compare processing time of Impala with Apache Hive for batch applications to implement the former in project. Extensively Used Impala to read, write and query the Hadoop data in HDFS.
- Prepared adhoc phoenix queries on Hbase.
- Created secondary index tables using phoenix on HBase tables
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to HDFS.
- Experience in Chef, Puppet or related tools for configuration management.
- Retrieved data from HDFS into relational databases with Sqoop.
- Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop. .
- Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service.
- Worked with Phoenix, a SQL layer on top of HBase to provide SQL interface on top of No-SQL database.
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
- Configuring, Managing permissions for the users in hue.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: HDFS, Map Reduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, CDH5, Apache Hadoop 2.6, Spark, SOLR, Storm, Knox, Cloudera Manager, Red Hat, MySQL and Oracle.
Hadoop Administrator
Confidential
Responsibilities:
- Working on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
- Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
- Developed custom writable MapReduce JAVA programs to load web server logs into HBase using flume.
- Worked on Hadoop CDH upgrade from CDH3.x to CDH4.x
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
- Developed entire data transfer model using Sqoop framework.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Configured flume agent with flume syslog source to receive the data from syslog servers.
- Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
- Installed and managed multiple Hadoop clusters - Production, stage, development.
- Installed and managed production cluster of 150 Node cluster with 4+ PB.
- Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
- Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.
- Designed the Cluster tests before and after upgrades to validate the cluster status.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
- Documented and prepared run books of systems processes and procedures for future references.
- Performed Benchmarking and performance tuning on the Hadoop infrastructure.
- Automated data loading between production and disaster recovery cluster.
- Migrated hive schema from production cluster to DR cluster.
- Worked on Migrating application by doing Poc's from relation database systems.
- Helping users and teams with incidents related to administration and development.
- Onboarding and training on best practices for new users who are migrated to our clusters.
- Guide users in development and work with developers closely for preparing a data lake.
- Migrated data from SQL Server to HBase using Sqoop.
- Scheduled data pipelines for automation of data ingestion in AWS.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, spark, Splunk, solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, Base, cluster health, monitoring security, Redhat Linux, impala, Cloudera
Hadoop Administrator
Confidential
Responsibilities:
- Involved in 24X7 Production support, Build and Migration Assignments.
- Worked Primarily on RHEL 4/5, HPUX, and Solaris Operating Systems.
- Text processing, also in network programming with Perl Scripting.
- Involved in migration activities using Red Hat LVM, Solaris LVM, Veritas and EMC Open Migratory.
- Installation of OAS (Oracle Application Server) on Solaris 9 and its configuration with oracle database.
- Writing Shell and Perl Scripting for automation of job.
- Tuning the kernel parameters based on the application/database requirement
- Used Veritas File system (Vexes) and Veritas Volume Manager (Vivo) to configure RAID 1 and RAID 5 Storage Systems on Sun Solaris.
- File system tuning, growing, and shrinking with Veritas File system 3.5/4.x.
- Installed and configured GFS cluster for holding databases.
- Manage user accounts for the team access for Red Hat Satellite Server.
- Build channels and pull the packages from master Red Hat Satellite Server.
- Troubleshooting hardware, software and configuration problems for various protocols and topologies.
- Configured open LDAP Red Hat Linux systems.
- Setup optimal RAID levels (fault tolerance) for protected data storage in NAS environments.
- Install and configure DHCP, DNS (BIND, MS), web (Apache, IIS), mail (SMTP IMAP and POP3) and file servers.
- Created new slices, mounted new file systems and uncounted file systems.
- Expertise in troubleshooting the systems and managing LDAP, DNS, DHCP and NIS.
- Worked with different Active directory databases like Microsoft AD, Tivoli Directory server with LDAP.
- Worked on making DNS entries to establish connection from server to DB2 database.
- Performed patching, backups on multiple environments of Solaris, Linux and VMware.
- Assisted other UNIX administrators when help was needed (i.e. creating UNIX accounts, writing scripts to perform system administrator functions, responding to trouble tickets, etc).
- Involved in preparation of functional and system specifications. Estimated storage requirements for applications.
Environment: Red Hat Linux (RHEL 3/4/5), Solaris, Logical Volume Manager, Sun & Veritas Cluster Server, Global File System, Red Hat Cluster Servers.
Hadoop Administrator
Confidential
Responsibilities:
- Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Managed 350+ Nodes CDH 5.2 cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Upgraded the Hadoop cluster from CDH4.7 to CDH5.2
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
- Monitored cluster for performance and, networking and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Test strategize, test plan and test case creation in providing test coverage across various products, systems, and platforms.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager.
- Maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Spark, Oozie, Flume, HBase, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5.
Big Data Security Administrator
Confidential
Responsibilities:
- Designed and implemented end to end big data platform solution on AWS.
- Manage Hadoop clusters in production, development, Disaster Recovery environments.
- Implemented SignalHub a data science tool and configured it on top of HDFS.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Rangerkms, Falcon, Smart sense, Storm, Kafka.
- Recovering from node failures and troubleshooting common Hadoop cluster issues.
- Implemented a multitenant Hadoop cluster and on boarded tenants to the cluster.
- Achieved data isolation through ranger policy based access control.
- Used YARN capacity scheduler to define compute capacity.
- Responsible for building a cluster on HDP 2.5
- Worked closely with developers to investigate problems and make changes to the Hadoop environment and associated applications.
- Expertise in recommending hardware configuration for Hadoop cluster
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks
- Trouble shooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
- Managing and reviewing Hadoop log files.
- Proven results-oriented person with a focus on delivery.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Managed cluster coordination services through Zookeeper.
- System/cluster configuration and health check-up.
- Continuous monitoring and managing the Hadoop cluster through Ambari.
- Created user accounts and given users the access to the Hadoop cluster.
- Resolving tickets submitted by users, troubleshoot the error documenting, resolving the errors.
- Performed HDFS cluster support and maintenance tasks like Adding and Removing Nodes without any effect to running jobs and data.
Environment: HDFS, Map Reduce, Hive, Pig, Flume, Oozie, Sqoop, HDP2.5, Ambari 2.4, Spark, SOLR, Storm, Knox, Centos 7 and MySQL.
Hadoop Administrator
Confidential
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
- Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Monitored multiple clusters environments using Metrics and Nagios.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Worked on analyzing Data with HIVE and PIG.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
Environment: - HDFS, Map Reduce, Hive, Sqoop, PIG, Cloudera, Flume, SQL Server, UNIX, RedHat and CentOS.
Linux/Unix Administrator
Confidential
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Involved in Development and Implementation of business Applications using Java/J2EE Technologies.
- Use of build script using ANT to generate JAR, WAR, EAR files and for integration testing and unit testing.
- Developed the entire application implementing MVC Architecture integrating JSP with Hibernate and spring frameworks.
- Created dynamic HTML pages, used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
- Used J2EE Design/Enterprise Integration patterns and SOA compliance for design and development of applications.
- Implemented AJAX functionality using jQuery and JSON to communicate to the server and populate the data on the JSP.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Shared responsibility for administration of Hadoop, Hive and Pig.
Environment: Hadoop 1x, Hive, Pig, HBASE, Sqoop and Flume, Spring, jQuery, Java, J2EE,Hibernate.