We provide IT Staff Augmentation Services!

Hadoop (cloudera) Admin Resume

5.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Over 8+ years of experience including 3+ years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
  • Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
  • Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
  • Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
  • Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
  • Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
  • Good knowledge on implementation and design of big data pipelines.
  • Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
  • Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
  • Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
  • Excellent knowledge of NOSQL databases like HBase, Cassandra.
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
  • Experience in job scheduling using different schedulers like FAIR, CAPACITY & FIFO and cluster co-ordination through DISTCP tool.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
  • Experience in designing and implementing of secure Hadoop cluster using Kerberos.
  • Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari, Ranger, and NiFi.
  • Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, HBase, etc.
  • Migrating applications from existing systems like MySQL, Oracle, DB2 and Teradata to Hadoop.
  • Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level
  • Involved with various teams on and offshore for understanding of the data that is imported from their source.
  • Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
  • Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
  • Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.

TECHNICAL SKILLS:

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks HDP (Ambari 2.6.5)

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: Web logic server, WebSphere and JBoss, Web Applications Tomcat and Nginx.

Programming Languages/Scripting: Java, Pl SQL, Shell Script, Perl, Python.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.

Databases: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Systems Administration, Incident Management, Release Management, Change Management.

WORK EXPERIENCE:

Hadoop (Cloudera) Admin

Confidential, Chicago, IL

Responsibilities:

  • Installing and Configuring Hadoop ecosystem (HDFS/Spark/Hive/Oozie/Yarn) using Cloudera manager and CDH.
  • Experience in setting up Dynamic Resource pools for distributing resources between pools.
  • Implemented the workflows using Apache Oozie framework to automate tasks and Developed Job Processing scripts using Oozie Workflow.
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing cluster.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the (AWS) platform instances using Puppet, Chef Configuration.
  • Load and transform data into HDFS from large set of structured data /Oracle/SQL Server using TalenD Big data studio.
  • Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
  • Involved in release management process to deploy the code to production.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Experience in setting up Hadoop clusters on cloud platforms like AWS.
  • Enabled security to the cluster using Kerberos and integrated clusters with LDAP at Enterprise level.
  • Created and maintained various Shell and Python scripts for automating various processes.
  • Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet. Worked with application teams to install Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, Hdfs, Spark, MapReduce, Yarn, Pig, Hive, Sqoop, Oozie, Kafka, Linux, AWS, HBase, Cassandra, Kerberos, Scala, Python, Shell Scripting.

Hadoop (KAFKA) Administrator

Confidential, Sunnyvale, CA

Responsibilities:

  • Manage Critical Data Pipelines that power analytics for various business units
  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Worked on Performance tuning on Hive SQLs. Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Experience with multiple Hadoop distributions like Apache, Cloudera and Hortonworks.
  • Maintained Hortonworks cluster with HDP Stack 2.4.2 managed by Ambari 2.2.
  • Built a Production and QA Cluster with the latest distribution of Hortonworks - HDP stack 2.6.1 managed by Ambari 2.5.1 on AWS Cloud
  • Worked on Kerberos Hadoop cluster with 250 nodes cluster.
  • Continuous monitoring and managing EMR cluster through AWS Console.
  • Kafka- Used for building real-time data pipelines between clusters.
  • Configuring schema repository for Oozie & Hive for centralized Azure SQL database.
  • Good understanding on Spark Streaming with Kafka for real-time processing.
  • Worked with Netezza integration with AZURE data lake.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
  • Configuring, automation and maintaining build and deployment CI/CD tools with high degrees of standardization for both infrastructure and application stack automation in AWS cloud platform.
  • Involved in moving all log files generated from various sources to HDFS for further processing.

Environment: Hadoop, Hive, Pig, Tableau, Netezza, Oracle, HDFS, MapReduce, Yarn, Sqoop, Oozie, Zookeeper, Tidal, CheckMK, Graphana, Vertica

Hadoop Admin

Confidential, Chicago, IL

Responsibilities:

  • Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
  • Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
  • Deployed Name Node high availability for major production cluster.
  • Installed Kafka cluster with separate nodes for brokers.
  • Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
  • Configured Oozie for workflow automation and coordination.
  • Troubleshoot production level issues in the cluster and its functionality.
  • Backup data on regular basis to a remote cluster using Distcp.
  • Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Performed Disk Space management to the users and groups in the cluster.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Documented slides & Presentations on Confluence Page.
  • Used Kafka to allow a single cluster to serve as the central data backbone for a large organization.
  • Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.

Environment: Kafka, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper,Kerberos,Nagios, MapR

Hadoop Admin

Confidential, Englewood, CO

Responsibilities:

  • Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
  • Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
  • Solved small file problem using Sequence files processing in Map Reduce.
  • Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.
  • Excellent Knowledge and understanding of Cassandra Architecture.
  • Involved in support and monitoring production Linux Systems.
  • Actively involved in doing Cassandra Migration to higher version. (from 2.0 to 2.2
  • Monitoring Linux daily jobs and monitoring log management system.
  • Expertise in troubleshooting and able to work with a team to fix large production issues.
  • Expertise in creating and managing DB tables, Index and Views.
  • User creation and managing user accounts and permissions on Linux level and DB level.
  • Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.

Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper

Hadoop Admin

Confidential

Responsibilities:

  • Experience in managing scalable Hadoop cluster environments.
  • Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Experience in HDFS maintenance and administration.
  • Experience in Name Node HA implementation.
  • Working on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
  • Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
  • Hands-on experience in Nagios and Ganglia monitoring tools.
  • Experience in HDFS data storage and support for running Map Reduce jobs.
  • Performing tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
  • Installing and configuring Hadoop eco system like Sqoop, Pig, Flume, and Hive.
  • Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.

Environment: Cloudera4.2, HDFS, Hive, Pig, Sqoop, HBase, Chef, Rhel, Mahout, Tableau, Micro strategy, Shell Scripting, Red Hat Linux.

Linux Systems Administrator

Confidential

Responsibilities:

  • Installation, configuration and troubleshooting of Red hat 4.x, 5.x, Ubuntu 12.x and HP-UX 11.x on various hardware platforms
  • Installed and configured JBoss application server on various Linux servers
  • Configured Kick-start for RHEL (4, and 5), Jumpstart for Solaris and NIM for AIX to perform image installation through network
  • Involved in configuration and troubleshooting of multipathing Solaris and bonding on Linux Servers
  • Worked with Red Hat Linux tools like RPM to install packages and patches for Red Hat Linux server
  • Developed scripts for automating administration tasks like customizing user environment, and performance monitoring and tuning with nfsstat, netstat, iostat and vmstat
  • Performed network and system troubleshooting activities under day to day responsibilities.
  • Created Virtual server on VMware ESX/ESXi based host and installed operating system on Guest Servers.
  • Installed and configured the RPM packages using the YUM Software manager.
  • Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
  • Defining and Develop plan for Change, Problem & Incident management Process based on ITIL.
  • Extensive use of Logical Volume Manager (LVM), creating Volume Groups, Logical volumes and disk mirroring in HP-UX, AIX and Linux.
  • Working with Clusters; adding multiple IP addresses to a Servers via virtual network interface in order to minimize network traffic (load-balancing and failover clusters)
  • User and Security administration; addition of User into the System and Password management.
  • System Monitoring and log management on UNIX and Linux Servers; including, crash and swap management, with password recovery and performance tuning.

Environment: Red hat, Ubuntu, JBoss, RPM, VMware, ESXi, LVM, DNS, NIS, HP-UX.

We'd love your feedback!