We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

5.00/5 (Submit Your Rating)

Houston, TX

SUMMARY:

  • Over 7years of experience including 3+years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
  • Experience in configuring, installing and managing MapR, Hortonworks& Cloudera Distributions.
  • Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
  • Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Working knowledge of monitoring tools and frameworks such as Splunk, Influx DB, Prometheus, SysDig, Data Dog, App-Dynamics, New Relic, and Nagios.
  • Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms. Also worked on Devops tools like Puppet and GIT.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
  • Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks,Cloudera and Map Reduce.
  • Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
  • Experience in Ranger, Knox configuration to provide the security for Hadoop services (hive, base, hdfs etc.).Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
  • Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
  • Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.Excellent knowledge of NOSQL databases like HBase, Cassandra.
  • Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
  • Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience with scripting languages python, Perl or shell script also.
  • Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
  • Experienced in workflow scheduling and monitoring tool Rundeck and Control-M.
  • Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
  • Working experience on designing and implementing complete end to end Hadoop Infrastructure.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
  • Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Elastic Search.

TECHNICAL SKILLS:

Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, H catalo, Phoenix, Falcon, Sqoop, Flume, Zookeeper, Mahout, Kafka, Oozie, Avro, H Base, Map Reduce, HDFS, Storm, CDH 5.3, ALM, TOAD, JIRA, Selenium, Test NG, JUnit, Impala, Storm, YARN, Apache Nifi.

Tools: Confidential, Pentaho, Hortonworks, Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit, Devops.

Databases: Oracle 11g, My SQL, MS SQL Server, IBM DB2 No SQL Databases HBase, Mongo DB Cassandra Data Stax Enterprise 4.6.1 Cassandra RDBMS Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL, InfluxDB, Couch base, NoSQL, Green Plum Confidential, HBase.

Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, J Boss

Programming Languages: Shell scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP, Perl.

Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia

Java frame work: MVC, Apache Struts2.0, Spring and Hibernate, Defect Management Jira, Quality Centre.

Testing: Capybara, Web Driver Testing Frameworks Respec, Cucumber, J unit, SVN

Operating Systems: Linux RHEL/Ubuntu, Windows (XP/7/8/10) UNIX.

Networking: TCP/IP Protocol, Switches & Routers, OSI Architecture.

Front End Technologies: HTML, XHTML, CSS, XML

QA Methodologies: Waterfall, Agile, (TM) V-model.

WORK EXPERIENCE:

Hadoop Administrator

Confidential, Houston, TX

Responsibilities:

  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks& Cloudera Hadoop Distribution.
  • Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
  • Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
  • Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
  • Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
  • Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
  • Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
  • Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
  • Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
  • Experience in innovative, and where possible, automated approaches for system administration tasks.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
  • Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
  • Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
  • Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
  • Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
  • Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
  • Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created hive tables on top.

Environment: HBase,Hadoop 2.2.4, Hive, Kerberos,Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, MapR, Yarn, Sqoop, Cassandra.

Hadoop Administrator

Confidential, Johns Creek, GA

Responsibilities:

  • Worked on Hadoop cluster with 150 nodes on Cloudera distribution 7.7.
  • Tested loading the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.Creating and deploying a corresponding solrCloud collection.
  • Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
  • Experience with Solr integration with Hbase using Lily indexer/Key-Value Indexer.
  • Used TIBCO Administrator to manage TIBCO Components, to monitor and manage the deployments.
  • Experience in setup, configuration and management of Apache Sentry for Role-based authorization and privilege validation for Hive and Impala Services.
  • Implement, document, configure, write queries, develop custom apps, support Splunk Indexers, Indexing and Field extractions using Splunk IFX, Forwarders, light weight forwarders and Splunk web for Splunk 5.x or search heads for Splunk 5.x/6.X.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
  • Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed Ambari server logs to NAS Storage.
  • Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
  • Performed Data transfer of large data back and forth from development and production clusters.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest). Worked with CDH4 as well as CDH5 applications.
  • Assisted in configuration, development and testing of Autosys JIL and other scripts
  • Involved in development and maintenance of Autosys and related scheduling solutions.
  • Documented the systems processes and procedures for future s
  • Worked on Navigator API to export Denied Access on Cluster to prevent security threat.
  • Worked on setting up Apache Nifi and used Nifi in orchestrating data pipeline.
  • Worked with Hadoop tools like Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
  • Experience on loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching. Creating volumes and snapshots though the MAPR CONTROL SYSTEM. Experience in support of our MapR Hadoop installations in Dev, QA and Product also.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also. Designed custom Spark application to handle similar datasets.
  • Experience in leverage our resources, provide economies of scale, and simplify remote/global support issues also. Experience on metrics and measures of utilization and performance also.

Environment: Hadoop, Cloudera 7.7, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Hbase.

Hadoop Admin

Confidential, Englewood, CO

Responsibilities:

  • Experience in configuring, installing and managing MapR, Hortonworks &Cloudera, Hive and Spark.
  • Skilled in scheduling recurring Pig and Hive jobs using Rundeck.
  • Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.Worked extensively on Hadoop/MapR platforms.
  • Hadoop Ecosystem Cloudera, Hortonworks, Hadoop, MapR, HDFS, H Base, Yarn, Zookeeper, Nagios, Hive, Pig, Ambari Spark Impala.Installed and configured Drill, Fuse and Impala on MapR-5.1.
  • Maintaining the Operations, installations, configuration of 150+ node clusters with MapR distribution.
  • Implementation of Kerberized Hadoop Ecosystem. Using Sqoop and Nifi in a Kerberized system to transfer data from relational databases like MySQL to HDFS.
  • Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
  • Hands on working experience with Devops tools, chef, puppet, Jenkins, git, maven, Ansible.
  • Installation, Upgrade, Configuration of Monitoring Tools (MySQL Enterprise Monitor, New Relic and DataDog APM monitoring). Experienced in MapR cluster to monitoring through ITRS.
  • Implemented Cloudera Impala on top of hive for faster querying for user.
  • Wrote workflows which include data cleansing Pig actions and hive actions.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Created Hive tables on top of HDFS files and designed queries to run on top.
  • Extended Hive and Pig core functionality by designing custom UDFs.
  • Experience on DNS, NFS, and DHCP, printing, mail, web, and FTP services for the enterprise.
  • Experience on Manages UNIX account maintenance including additions, changes, and removals

Environment: Kerberos, Linux Admin,Kafka, YARN, Spark, HBase, Hive, Impala, SOLR,Java Hadoop cluster, HDFS, Ambari, Ganglia, Nagios, Cloudera, MapR.

Hadoop Admin

Confidential, Boston, MA

Responsibilities:

  • Developed data pipeline using Flume, Pig, Sqoopand Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Implemented Flow file, Connections, Flow controller and Process Group as part of Nifi Process for automating the movement of data.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Creating volumes and snapshots though the MAPR CONTROL SYSTEM. Experience in support of our MapR Hadoop installations in Dev, QA and Product also.
  • Install various Splunk Applications such as Cisco for Splunk, Windows for Splunk, and VMware for Splunk.Created MapR DB tables and involved in loading data into those tables.
  • Expertise in Virtualizations System Administration of VMware, VMware Server, VMware Lab Manager, Cloud, Amazon EC2 & S3 web services.
  • Involved in migrating java test framework to python flask.
  • Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.

Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Ambari 2.0, Linux Cent OS, MongoDB, Cassandra, Ganglia and Cloudera Manager.

Hadoop Admin

Confidential, Minneapolis, MN

Responsibilities:

  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an adminfollowed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users. Responsible for Patching Linux Servers and applied patches to cluster.Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability. Also done major and minor upgrades to the Hadoop cluster.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
  • Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
  • Experience in scripting languages python, Perl or shell script also.
  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks &Cloudera Hadoop Distribution.
  • Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
  • Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.

Environment: Flume, Oozie, Cassandra, WebLogic, Pig, Sqoop, Hbase, Hive, Map- Reduce, YARN, Hortonworks Manager.

Linux/Systems Engineer

Confidential

Responsibilities:

  • Patched RHEL5 and Solaris servers for EMC Power path Upgrade for VMAX migration.
  • Configured LVM (Logical Volume Manager) to manage volume group, logical and physical partitions and importing new physical volumes.
  • Build open source Nagios Core monitor tools and OpenVPN / OpenLDAP server on EC2 instance.
  • Maintained and monitored all servers' operating system and application patch level, disk space and memory usage, user activities on daily basis, administration on Sun Solaris and RHEL systems, management archiving.
  • OpenLdap server & clients, PAM authentication setup on RedHat Linux 6.5/7.1.
  • Installed, configured, troubleshoot and maintain Linux Servers and Apache Web server, configuration and maintenance of security and scheduling backups, submitting various types of croon jobs.
  • Installations of HP Open view, monitoring tool, in servers and worked with monitoring tools such as Nagios and HP Open view.
  • Creation of VMs, cloning and migrations of the VMs on VMware vSphere 4.0/4.1
  • Setup and configured Apache to integrate with IBM WebSphere in load balancing environment.
  • RHEL 4.1, Red hat Linux, IBM x series and HP ProLiant, Windows.
  • Installing and upgrading OE & Red hat Linux and Solaris & SPARC on Servers like HP DL 380 G3, 4 and 5 &Dell Power Edge servers.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, My SQL Database backend.

Environment: Linux/Unix, Red hat Linux, Unix Shell Scripting, SQL Server 2005, XML, Windows 2000/NT/2003 Server, and UNIX.

We'd love your feedback!