Lead Hadoop Admin Resume
Irving, TX
SUMMARY:
- Over 10+ years of experience including 4 years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Experienced in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache , Horton works, Cloudera and MapReduce .
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS ) using an EC2 instance.
- Excellent knowledge of in NOSQL databases like HBase , Cassandra . Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Experience in creating custom Lucene/Solr Query components.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop,Solr, Spark, Kafka,Flume, Yarn, Oozie, and Zookeeper.
- Used NoSQL database with Cassandra, HBase,Hive, MongoDB and Monod .
- Experience in developing and scheduling ETL workflows in Hadoop using Oozie . Also have substantial experience writing MapReduce jobs in Java, Pig, Flume, Zookeeper and Hive and Storm .
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
- Extensively involved in Test Plan re-design, Test Case re-Creation, Test Automation and Test Execution of web and client server applications as per change requests.
- Experience in working on internet related technologies such as HTML , JavaScript, VBScript and XML.
- Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux 4, 5 and 6, familiar with Solaris 9 &10 and IBM AIX 6.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce,HDFS, Storm, CDH 5.3, CDH 5.4Tools Quality center v11.0\ALM, TOAD, JIRA, HP QTP, HP UFT, Selenium, Test NG, JUnit
Programming Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
QA Methodologies: Waterfall, Agile,(TM) V-model
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate Defect Management Jira, Quality Centre.
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G) Web Services SOAP(JAX-WS), WSDL, SOA, Restful(JAX-RS), JMS
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, Jboss
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB Cassandra Data Stax Enterprise 4.6.1 Cassandra
RDBMS: Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL Operating Systems Linux, UNIX, MAC, Windows
PROFESSIONAL EXPERIENCE:
Lead Hadoop Admin
Confidential - Irving, TX
Roles and Responsibilities:
- Worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive,Yarn, Hue, Pig, HBase, Sqoop, Flume, Spark AVRO, Zookeeper, Tableau,Bedrockgeneric ODBC/JDBC,etc.) Horton works (HDP 2.2.4.2), for 4 clusters ranges from POC to PROD contains nearly 100 nodes.
- Provided 24x7 operation support for large scale Hadoop and MongoDB clusters across production, UAT and development environments.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Implement Flume, Spark, Spark Stream framework for real time data processing. Developed analytical components using Scala, Spark and Spark Stream. Implemented Proofs of Concept on Hadoop and Spark stack and different big data analytic tools, using Spark SQL as an alternative to Impala
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Experienced in administrating Big Data clusters with HDFS, Kafka, Zookeeper, Hive, Yarn, Hue, Oozie, etc.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Involved in implementing security on Horton works Hadoop Clusters using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Responsible for upgrading Horton works Hadoop HDP2.2.0 and Map reduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Used MySQL databases by Cluster Configuring and maintenance of HDFS, YARN Resource Manager, MapReduce, Hive, HBASE, Kafka, or Spark;
- String understanding and experience of ODBC/JDBC with various clients like Tableau, Microstrategy, and server components; and
- Monitoring and tune cluster component performance.
- Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Working on this project spanning from Architecting, Installation, Configuration and Management of Hadoop Clusters.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
- Development projects Extensively on Hive, Spark, Pig, Sqoop and Gem fire XD throughout the development Lifecycle until the projects went into Production. Created reporting views in Impala using Sentry Policy files.
- Responsible for Handler configuration and handler ESB mappings. Also Involved in Integrating Hive with Mulesoft ESB to land data into applications running on Sales force and vice versa.
- Implemented dual data center set up for all Cassandra cluster. Performed many complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize
- Implemented Spark solution to enable real time reports from Cassandra data. Was also actively involved in designing column families for various Cassandra Clusters.
ENVIRONMENT: RHEL, Ubuntu, Cloudera Manager, Cloudera Search, CDH4, HDFS, HBase, Hive, Pig, ZooKeeper, Monitoring Cluster with automated scripts, Map Reduce2 (YARN), PostgreSQL, MySQL, HDFS, Kafka, Zookeeper, Hive, Yarn, Hue, Oozie, QAS and Ganglia.
Lead Hadoop Administrator
Confidential - San Ramon, CA
Roles and Responsibilities:
- Hands on Installation and configuration of Hortonworks Data Platform HDP 2.3.4
- Worked on installing production cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
- Worked on Hadoop Administration, responsibilities include software installation, configuration, software upgrades, backup and recovery, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster up and run on healthy.
- Implemented the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Designed, developed and implemented connectivity products that allow efficient exchange of data between the core database engine and the Hadoop ecosystem.
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs.
- Implemented Name Node High Availability on the Hadoop cluster to overcome single point of failure.
- Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
- Worked on importing and exporting data from Oracle database into HDFS and HIVE using Sqoop.
- Monitored and analyzed of the Map Reduce job executions on cluster at task level.
- Extensively involved in Cluster Capacity planning, Hardware planning, Performance Tuning of the Hadoop Cluster.
- Wrote automation scripts and setting up crontab jobs to maintain cluster stability and healthy.
- Installed Ambari on an already existing Hadoop cluster.
- Implemented Rack Awareness for data locality optimization.
- Optimized and tuned the Hadoop environments to meet performance requirements.
- Hand-On experience with AWS cloud with EC2, S3.
- Collaborating with offshore team.
- Ability to document existing processes and recommend improvements.
- Shares knowledge and assists another team member as needed.
- Assist with maintenance and troubleshooting of scheduled processes.
- Participated in development of system test plans and acceptance criteria.
- Collaborate with offshore developers in order to monitor ETL jobs and troubleshoot steps.
Environment: Hortonworks HDP2.3x, Ambari, Oozie 4.2, Sqoop1.4.6, Hive 1.2, Mapreduce2, Ambari, blueprints, Oracle Sql Developer, Teradata, SVN, SFTP, SSH, Eclipse, Jdk 1.7, Maven.
Sr.Hadoop Admin
Confidential, Los Angeles CA
Roles and Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadooptools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Experience in methodologies such as Agile, Scrum and Test driven development
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, Cassandra and slots configuration.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Installed, configured, and administered a small Hadoop clusters consisting of 10 nodes. Monitored cluster for performance and, networking and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Managed 350+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Upgraded the Hadoop cluster from CDH4.7 to CDH5.2.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Ambari 2.0, Linux Cent OS, HBase, MongoDB, Cassandra, Ganglia and Cloudera Manager.
Hadoop Administrator
Confidential
Roles and Responsibilities:
- Responsible for Cluster configuration maintenance and troubleshooting and tuning the cluster.
- Good experience on cluster audit findings and tuning configuration parameters.
- Implemented Kerberos security in all environments.
- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Install and configure Phoenix on HDP 2.1. Create views over HBasetable and used SQL queries to retrieve alerts and Meta data.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
- Experience in configuring the cluster using FIFO or FAIR share.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Worked on High Availability for Name Node using Cloudera Manager to avoid single point of failure.
- Develop Session tasks and Workflows using Power Center Workflow Manager.
- Manage and review data backups and log files.
- Used Ganglia to monitor the cluster around the clock.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Commissioning and Decommissioning Nodes from time to time.
- Set up and manage HA name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters
- Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
- Working with Cloudera Support Team to Fine Tune Cluster.
- Production support responsibilities include cluster maintenance.
Environment: RHEL 4.x/5/6, Solaris 9, 10&11, HPUX, Centos, SUSE 10, 11, VERITAS Volume anager 3.x/ 4.x, VERITAS Storage Foundation 5, RedHat Cluster, VERITAS Cluster Server 4.1, Tripwire, NFS, DNS, SAN/NAS, puppet, chef, Splunk.
Linux Admin
Confidential
Roles and Responsibilities:
- Provisioning, building and support of Linux servers both Physical and Virtual using VMware for Production, QA and Developers environment.
- Installed, configured and Administrated of all UNIX/LINUX servers, includes the design and selection of relevant hardware to support the installation/upgrades of Red Hat Cent OS, Ubuntu operating systems.
- Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
- Responsible for managing the Chef Client nodes and upload the cookbooks to chef-server from Workstation.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Responsible for configuring real time backup of web servers. Log file was managed for troubleshooting and probable errors.
- Responsible for reviewing all open tickets, resolve and close any existing tickets.
- Document solutions for any issues that have not been discovered previously.
- Worked with File System includes UNIX file System and Network file system. Planning, scheduling and implementation of O/s.Worked with File System includes UNIX file System and Network file system. Planning, scheduling and implementation of O/s.patches on both Solaris & Linux. Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Highly experienced in optimizing performance of Web Sphere Application server using Workload Management (WLM).
- Patch management of servers and maintaining server's environment in Development/QA/Staging /Production.
- Performing Linux systems administration on production and development servers (RedHat Linux, Cent OS and other UNIX utilities).
- Installing Patches and packages on Unix/Linux Servers.
- Installation, Configuration, upgradation and administration of Sun Solaris, RedHat Linux.
- Installation and Configuration of VMware vSphere client, Virtual Server creation and resource allocation.
- Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation
- Environment: Linux/Centos 4, 5, 6, Logical Volume Manager, VMware ESX, Apache and Tomcat Web Server HPSM, HPSA. Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers.
- Monitoring System Metrics and logs for any problems.
- Running Cron-tab to back up Data.
- Applied Operating System updates, patches and configuration changes.
- Maintaining the MySQL server and Authentication to required users for Databases.
- Appropriately documented various Administrative & technical issues
Environment: Red hat Linux/Centos 4, 5, 6, Logical Volume Manager, VMware ESX 5.1/5.5, Apache and Tomcat Web Server, Oracle 11,12, Oracle Rac 12c, HPSM, HPSA.