Hadoop Administrator Resume
St Louis, MO
SUMMARY
- Over 8+ years of experience including 4 years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
- Hands on experience in installing, configuring, monitoring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works, Oozie, Elastic search, Apache Spark, Impala.
- Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
- Apache Solr administration and configuration experience.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Working knowledge of monitoring tools and frameworks such as Splunk, ELK, Zabbix, Influx DB, Prometheus, SysDig, Data Dog, AppDynamics, New Relic, Nagios.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms.
- Worked on DevOps tools like Puppet and GIT
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Hands on experience on installation and configuration of multiple versions of Ruby using RVM (Ruby Version Manager) bundler install, use of RVM files, Usage of Pre-project Gem sets, Deploy and integration of RVM.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- Experience with scripting languages python, perl or shell script also.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Horton works Cloud era and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Hadoop Ecosystem Cloud era, Horton works, Hadoop Map R, HDFS, H Base, Yarn, Zookeeper, Nagios, Hive, Pig, Ambari Spark Impala.
- Experience in Ranger, Knox configuration to provide the security for Hadoop services (hive, base, hdfs etc.).
- Excellent knowledge of NOSQL databases like H Base, Cassandra.
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase,Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, ZooKeeper, and NiFi in Kerberized environments.
- Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
- Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.
- Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Experience in support of our MapR Hadoop installations in Dev, QA, Product also
- Release process implementation like DevOps and Continuous Delivery methodologies to existing Build and Deployments.
- Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Experienced in developing and implementing web applications JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
- Experienced in ISILON based storage system integration with Hadoop in place of regular HDFS storage systems.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
- Proficient in configuring Zookeeper, Flume to the existing Hadoop cluster.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, H catalog, Phoenix, Falcon, Sqoop, Flume, Zookeeper, Mahout, Kafka, oozie, Avro, H Base, Map Reduce, HDFS, Storm, CDH 5.3, \ALM, TOAD, JIRA, Selenium, Test NG, JUnit, Impala, Storm, YARN, Apache NiFi
Tools: Tera data, Pentaho, Horton work, Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit, DevOps.
Databases: Oracle 11g, My SQL, MS SQL Server, IBM DB2 No SQL Databases H Base, Mongo DB Cassandra Data Stax Enterprise 4.6.1 Cassandra RDBMS Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL, InfluxDB, Couch base, NoSQL, Green Plum Teradata, HBase.
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, J Boss
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential, St.Louis, MO
Responsibilities:
- Worked on hadoop cluster with 250 nodes on cloudera distribution 5.0.1.
- Tested loading the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked on distribution System Cloudera Manager CDH 5.5+/CDH, 5.8+, Hortonworks 2.2.
- Deployed Grafana, that uses Graphite and Collect to monitor metrics on 750 servers.
- Installed Kerberos secured kafka cluster with no encryption on POC vms also set up Kafka ACL's
- Creating and deploying a corresponding solrCloud collection.
- Experience with Solr integration with Hbase using Lily indexer/Key-Value Indexer.
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion, SOLR and HBase for and real-time querying
- Used TIBCO Administrator to manage TIBCO Components, to monitor and manage the deployments.
- Involved in updating scripts and step actions to install ranger plugins.
- Experience in setup, configuration and management of Apache Sentry for Role-based authorization and privilege validation for Hive and Impala Services.
- Implement, document, configure, write queries, develop custom apps, support Splunk Indexers, Indexing and Field extractions using Splunk IFX, Forwarders, light weight forwarders and Splunk web for Splunk 5.x or search heads for Splunk 5.x/6.X.
- I successfully set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
- Cluster maintenance as well as creation and removal of nodes using Ambari Hortonworks.
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling dist-chache, controlling import process, compression codecs, importing data to hive, hbase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Migrated Hortonworks HDP and HDF cluster from on premise to AWS through Cloudbreak.
- Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
- Worked with CDH4 as well as CDH5 applications. Performed Data transfer of large data back and forth from development and production clusters.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase,Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, ZooKeeper, and NiFi in Kerberized environments.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Assisted in configuration, development and testing of AutoSys JIL and other scripts
- Involved in development and maintenance of AutoSys and related scheduling solutions.
- Documented the systems processes and procedures for future references
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters
- Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collectl as a metric sender.
- Worked on Navigator API to export Denied Access on Cluster to Prevent security threat.
- Worked on setting up Apache NiFi and used NiFi in orchestrating data pipeline.
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Worked with hadoop tools like Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
- Experience in workflow scheduling and monitoring tool Rundeck and Control-M.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Hbase.
HadoopAdmin
Confidential - San Francisco, CA
Responsibilities:
- Experience in configuring, installing and managing MapR, Horton works & Cloudera, Hive, Spark.
- Monitoring Solr transparency's, stat's, cle's dashboard and review the solr servers.
- Skilled in scheduling recurring Pig and Hive jobs using Rundeck.
- Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collectl as a metric sender.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Hadoop Ecosystem Cloud era, Horton works, Hadoop MapR, HDFS, H Base, Yarn, Zookeeper, Nagios, Hive, Pig, Ambari Spark Impala.
- Worked extensively on Hadoop/MapR platforms.
- Maintaining the Operations, installations, configuration of 150+ node clusters with MapR distribution.
- Implemented MapR token based security.
- Installed and configured Drill, Fuse and Impala on MapR-5.1.
- Implementation of Kerberized Hadoop Ecosystem. Using Sqoop and NiFi in an Kerberized system to transfer data from relational databases like MySQL to HDFS.
- Created MapR DB tables and involved in loading data into those tables.
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Hands on working experience with DevOps tools, chef, puppet, Jenkins, git, maven, ansible.
- Installation, Upgrade, Configuration of Monitoring Tools (MySQL Enterprise Monitor, NewRelic and DataDog APM monitoring)
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 100+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
- Installed, configured and deployed 80+ nodes MapR Hadoop Cluster for Development and Production
- Creating volumes and snapshots though the MAPR CONTROL SYSTEM.
- Experienced in MapR cluster to monitoring through ITRS.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster with IBM Big Insights, Horton works Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in Mapr Control System (MCS).
- Worked on the cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in Mapr Control System (MCS)
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Good knowledge on Spark platform parameters like memory, cores and executors.
- Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
- Implemented Cloud era Impala on top of hive for faster querying for user.
- Wrote workflows which include data cleansing Pig actions and hive actions.
- Designed custom Spark REPL application to handle similar datasets.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Mentored EQM team for creating Hive queries to test use cases.
- Created Hive tables on top of HDFS files and designed queries to run on top.
- Worked on NoSQL database like HBase and created hive tables on top.
- Extended Hive and Pig core functionality by designing custom UDFs
- Experience on loading data from UNIX file system to HDFS. And Created custom Solr Query components to enable optimum search matching
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities).
- Experience on DNS, NFS, and DHCP, printing, mail, web, and FTP services for the enterprise.
- Experience on Manages UNIX account maintenance including additions, changes, and removals
Environment: Hadoop Cluster, Kerberos Linux Admin,Kafka, YARN, Spark, HBase, Hive, Impala, SOLR,Java Hadoop cluster, HDFS, Ambari, Ganglia, Nagios,CentOS, RedHat, Linux, Windows,, Cloudera, MapR,Cloudera, HortonWorks, MapR.
Hadoop Admin
Confidential - Boston,MA
Responsibilities:
- Developed data pipeline using Flume, Pig, sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Involved in Splunk UI/GUI development and operations roles.
- Implemented Flow file, Connections, Flow controller and Process Group as part of NiFi Process for automating the movement of data.
- Installed/Configured/Maintained Apache hadoop clusters for application development and Hadoop tools like Hive, Pig, H Base, Zookeeper and Sqoop.
- Install various Splunk Applications such as Cisco for Splunk, Windows for Splunk, VMware for Splunk.
- Expertise in Virtualizations System Administration of VMware EESX/EESXi, VMware Server, VMware Lab Manager, Vcloud, Amazon EC2 & S3 web services.
- Involved in migrating java test framework to python flask.
- Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Horton works & Cloud era Hadoop Distribution.
- Experience in managing the Hadoop cluster with IBM Big Insights, Horton works Distribution Platform.
- Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Experience in managing the Hadoop infrastructure with Cloudera Manager.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Performance tuning a Cassandra cluster to optimize writes and reads.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Experience in support of our MapR Hadoop installations in Dev, QA, Product also
- Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
- Experience in hardware and software technical problems, storage and related system malfunctions also.
- Experience on metrics and measures of utilization and performance also.
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience in leverage our resources, provide economies of scale, and simplify remote/global support issues also.
- Experience in scripting languages python, perl or shell script al
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloud era Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Environment: Hive, Pig, H Base, Zookeeper and Sqoop, ETL, Ambari 2.0, Linux Cent OS, Mongo DB, Cassandra, Ganglia and Cloud era Manager.
Hadoop Admin
Confidential
Responsibilities:
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users. Responsible for Patching Linux Servers.
- Applied patches to cluster.
- Added new Data Nodes when needed and ran balancer. Responsible for building scalable distributed data solutions using Hadoop.
- Involved in working on Cassandra database to analyze how the data get stored
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
- Installed oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability. Also Done major and minor upgrades to the Hadoop cluster.
- Upgraded the Cloud era Hadoop ecosystems in the cluster using Cloud era distribution packages.
- Done stress and performance testing, benchmark for the cluster.
- Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
- Debug and solve the major issues with Cloud era manager by interacting with the Cloud era team from Cloud era.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Experience in support of our MapR Hadoop installations in Dev, QA, Product also
- Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
- Experience in hardware and software technical problems, storage and/or related system malfunctions also.
- Experience on metrics and measures of utilization and performance also.
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience in scripting languages python, perl or shell script also.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Horton works & Cloud era Hadoop Distribution.
- Experience in managing the Hadoop cluster with IBM Big Insights, Horton works Distribution Platform.
- Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Experience in managing the Hadoop infrastructure with Cloud era Manager.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Experience in leverage our resources, provide economies of scale, and simplify remote/global support issues also.
- Experience in scripting languages python, perl or shell script also.
Environment: Flume, oozie, Cassandra, Web Logic, Pig, Sqoop, H base, Hive, Map- Reduce, YARN, Horton works and Cloud era Manager.
Linux/Systems Engineer
Confidential
Responsibilities:
- Patched RHEL5 and Solaris servers for EMC Power path Upgrade for VMAX migration.
- Configured LVM (Logical Volume Manager) to manage volume group, logical and physical partitions and importing new physical volumes.
- Build open source Nagios Core monitor tools and OpenVPN / OpenLDAP server on EC2 instance.
- Maintained and monitored all servers' operating system and application patch level, disk space and memory usage, user activities on daily basis, administration on Sun Solaris and RHEL systems, management archiving.
- OpenLdap server & clients, PAM authentication setup on RedHat Linux 6.5/7.1.
- Installed, configured, troubleshoot and maintain Linux Servers and Apache Web server, configuration and maintenance of security and scheduling backups, submitting various types of croon jobs.
- Installations of HP Open view, monitoring tool, in servers and worked with monitoring tools such as Nagios and HP Open view.
- Creation of VMs, cloning and migrations of the VMs on VMware v Sphere 4.0/4.1
- Setup and configured Apache to integrate with IBM Web Sphere in load balancing environment.
- RHEL 4.1, Red hat Linux, IBM x series and HP Pro Liant, Windows.
- Installing and upgrading OE & Red hat Linux and Solaris & SPARC on Servers like HP DL 380 G3, 4 and 5 &Dell Power Edge servers.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, My SQL Database backend.
Environment: Linux/Unix, Red hat Linux, Unix Shell Scripting, SQL Server 2005, XML, Windows 2000/NT/2003 Server, and UNIX.