Hadoop (Cloudera) Admin Resume Chicago, IL - Hire IT People

SUMMARY:

Over 8+ years of experience including 3+ years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
Good knowledge on implementation and design of big data pipelines.
Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
Excellent knowledge of NOSQL databases like HBase, Cassandra.
Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
Experience in job scheduling using different schedulers like FAIR, CAPACITY & FIFO and cluster co-ordination through DISTCP tool.
Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
Experience in designing and implementing of secure Hadoop cluster using Kerberos.
Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari, Ranger, and NiFi.
Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, HBase, etc.
Migrating applications from existing systems like MySQL, Oracle, DB2 and Teradata to Hadoop.
Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level
Involved with various teams on and offshore for understanding of the data that is imported from their source.
Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.

TECHNICAL SKILLS:

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks HDP (Ambari 2.6.5)

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: Web logic server, WebSphere and JBoss, Web Applications Tomcat and Nginx.

Programming Languages/Scripting: Java, Pl SQL, Shell Script, Perl, Python.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.

Databases: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Systems Administration, Incident Management, Release Management, Change Management.

WORK EXPERIENCE:

Hadoop (Cloudera) Admin

Confidential, Chicago, IL

Responsibilities:

Installing and Configuring Hadoop ecosystem (HDFS/Spark/Hive/Oozie/Yarn) using Cloudera manager and CDH.
Experience in setting up Dynamic Resource pools for distributing resources between pools.
Implemented the workflows using Apache Oozie framework to automate tasks and Developed Job Processing scripts using Oozie Workflow.
Installed and configured CDH cluster, using Cloudera manager for easy management of existing cluster.
Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
Managed servers on the (AWS) platform instances using Puppet, Chef Configuration.
Load and transform data into HDFS from large set of structured data /Oracle/SQL Server using TalenD Big data studio.
Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
Involved in release management process to deploy the code to production.
As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
Worked in AWS environment for development and deployment of Custom Hadoop Applications.
Experience in setting up Hadoop clusters on cloud platforms like AWS.
Enabled security to the cluster using Kerberos and integrated clusters with LDAP at Enterprise level.
Created and maintained various Shell and Python scripts for automating various processes.
Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet. Worked with application teams to install Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, Hdfs, Spark, MapReduce, Yarn, Pig, Hive, Sqoop, Oozie, Kafka, Linux, AWS, HBase, Cassandra, Kerberos, Scala, Python, Shell Scripting.

Hadoop (KAFKA) Administrator

Confidential, Sunnyvale, CA

Responsibilities:

Manage Critical Data Pipelines that power analytics for various business units
Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
Worked on Performance tuning on Hive SQLs. Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Experience with multiple Hadoop distributions like Apache, Cloudera and Hortonworks.
Maintained Hortonworks cluster with HDP Stack 2.4.2 managed by Ambari 2.2.
Built a Production and QA Cluster with the latest distribution of Hortonworks - HDP stack 2.6.1 managed by Ambari 2.5.1 on AWS Cloud
Worked on Kerberos Hadoop cluster with 250 nodes cluster.
Continuous monitoring and managing EMR cluster through AWS Console.
Kafka- Used for building real-time data pipelines between clusters.
Configuring schema repository for Oozie & Hive for centralized Azure SQL database.
Good understanding on Spark Streaming with Kafka for real-time processing.
Worked with Netezza integration with AZURE data lake.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
Configuring, automation and maintaining build and deployment CI/CD tools with high degrees of standardization for both infrastructure and application stack automation in AWS cloud platform.
Involved in moving all log files generated from various sources to HDFS for further processing.

Environment: Hadoop, Hive, Pig, Tableau, Netezza, Oracle, HDFS, MapReduce, Yarn, Sqoop, Oozie, Zookeeper, Tidal, CheckMK, Graphana, Vertica

Hadoop Admin

Confidential, Chicago, IL

Responsibilities:

Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
Deployed Name Node high availability for major production cluster.
Installed Kafka cluster with separate nodes for brokers.
Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
Configured Oozie for workflow automation and coordination.
Troubleshoot production level issues in the cluster and its functionality.
Backup data on regular basis to a remote cluster using Distcp.
Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
Used Apache Kafka for importing real time network log data into HDFS.
Performed Disk Space management to the users and groups in the cluster.
Used Storm and Kafka Services to push data to HBase and Hive tables.
Documented slides & Presentations on Confluence Page.
Used Kafka to allow a single cluster to serve as the central data backbone for a large organization.
Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
Used Sqoop, Distcp utilities for data copying and for data migration.
Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.

Environment: Kafka, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper,Kerberos,Nagios, MapR

Hadoop Admin

Confidential, Englewood, CO

Responsibilities:

Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
Involved in managing and reviewing Hadoop log files.
Involved in running Hadoop streaming jobs to process terabytes of text data.
Supported Map Reduce Programs those are running on the cluster.
Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Wrote MapReduce jobs to discover trends in data usage by users.
Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
Solved small file problem using Sequence files processing in Map Reduce.
Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
Monitor System health and logs and respond accordingly to any warning or failure conditions.
Excellent Knowledge and understanding of Cassandra Architecture.
Involved in support and monitoring production Linux Systems.
Actively involved in doing Cassandra Migration to higher version. (from 2.0 to 2.2
Monitoring Linux daily jobs and monitoring log management system.
Expertise in troubleshooting and able to work with a team to fix large production issues.
Expertise in creating and managing DB tables, Index and Views.
User creation and managing user accounts and permissions on Linux level and DB level.
Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.

Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper

Hadoop Admin

Confidential

Responsibilities:

Experience in managing scalable Hadoop cluster environments.
Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Experience in HDFS maintenance and administration.
Experience in Name Node HA implementation.
Working on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
Hands-on experience in Nagios and Ganglia monitoring tools.
Experience in HDFS data storage and support for running Map Reduce jobs.
Performing tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
Installing and configuring Hadoop eco system like Sqoop, Pig, Flume, and Hive.
Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.

Environment: Cloudera4.2, HDFS, Hive, Pig, Sqoop, HBase, Chef, Rhel, Mahout, Tableau, Micro strategy, Shell Scripting, Red Hat Linux.

Linux Systems Administrator

Confidential

Responsibilities:

Installation, configuration and troubleshooting of Red hat 4.x, 5.x, Ubuntu 12.x and HP-UX 11.x on various hardware platforms
Installed and configured JBoss application server on various Linux servers
Configured Kick-start for RHEL (4, and 5), Jumpstart for Solaris and NIM for AIX to perform image installation through network
Involved in configuration and troubleshooting of multipathing Solaris and bonding on Linux Servers
Worked with Red Hat Linux tools like RPM to install packages and patches for Red Hat Linux server
Developed scripts for automating administration tasks like customizing user environment, and performance monitoring and tuning with nfsstat, netstat, iostat and vmstat
Performed network and system troubleshooting activities under day to day responsibilities.
Created Virtual server on VMware ESX/ESXi based host and installed operating system on Guest Servers.
Installed and configured the RPM packages using the YUM Software manager.
Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
Defining and Develop plan for Change, Problem & Incident management Process based on ITIL.
Extensive use of Logical Volume Manager (LVM), creating Volume Groups, Logical volumes and disk mirroring in HP-UX, AIX and Linux.
Working with Clusters; adding multiple IP addresses to a Servers via virtual network interface in order to minimize network traffic (load-balancing and failover clusters)
User and Security administration; addition of User into the System and Password management.
System Monitoring and log management on UNIX and Linux Servers; including, crash and swap management, with password recovery and performance tuning.

Environment: Red hat, Ubuntu, JBoss, RPM, VMware, ESXi, LVM, DNS, NIS, HP-UX.

We provide IT Staff Augmentation Services!

Hadoop (cloudera) Admin Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship