Hadoop (cloudera) Admin Resume
Chicago, IL
SUMMARY:
- Over 8+ years of experience including 3+ years of experience with Hadoop Ecosystem in installation and administrated of all UNIX/LINUX servers and configuration of different Hadoop eco - system components in the existing cluster project.
- Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
- Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
- Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
- Good knowledge on implementation and design of big data pipelines.
- Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Excellent knowledge of NOSQL databases like HBase, Cassandra.
- Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Experience in job scheduling using different schedulers like FAIR, CAPACITY & FIFO and cluster co-ordination through DISTCP tool.
- Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari, Ranger, and NiFi.
- Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, HBase, etc.
- Migrating applications from existing systems like MySQL, Oracle, DB2 and Teradata to Hadoop.
- Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level
- Involved with various teams on and offshore for understanding of the data that is imported from their source.
- Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
- Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
- Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks HDP (Ambari 2.6.5)
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Servers: Web logic server, WebSphere and JBoss, Web Applications Tomcat and Nginx.
Programming Languages/Scripting: Java, Pl SQL, Shell Script, Perl, Python.
Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.
Databases: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
Processes: Systems Administration, Incident Management, Release Management, Change Management.
WORK EXPERIENCE:
Hadoop (Cloudera) Admin
Confidential, Chicago, IL
Responsibilities:
- Installing and Configuring Hadoop ecosystem (HDFS/Spark/Hive/Oozie/Yarn) using Cloudera manager and CDH.
- Experience in setting up Dynamic Resource pools for distributing resources between pools.
- Implemented the workflows using Apache Oozie framework to automate tasks and Developed Job Processing scripts using Oozie Workflow.
- Installed and configured CDH cluster, using Cloudera manager for easy management of existing cluster.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Managed servers on the (AWS) platform instances using Puppet, Chef Configuration.
- Load and transform data into HDFS from large set of structured data /Oracle/SQL Server using TalenD Big data studio.
- Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
- Involved in release management process to deploy the code to production.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Experience in setting up Hadoop clusters on cloud platforms like AWS.
- Enabled security to the cluster using Kerberos and integrated clusters with LDAP at Enterprise level.
- Created and maintained various Shell and Python scripts for automating various processes.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet. Worked with application teams to install Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, Hdfs, Spark, MapReduce, Yarn, Pig, Hive, Sqoop, Oozie, Kafka, Linux, AWS, HBase, Cassandra, Kerberos, Scala, Python, Shell Scripting.
Hadoop (KAFKA) Administrator
Confidential, Sunnyvale, CA
Responsibilities:
- Manage Critical Data Pipelines that power analytics for various business units
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Worked on Performance tuning on Hive SQLs. Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Experience with multiple Hadoop distributions like Apache, Cloudera and Hortonworks.
- Maintained Hortonworks cluster with HDP Stack 2.4.2 managed by Ambari 2.2.
- Built a Production and QA Cluster with the latest distribution of Hortonworks - HDP stack 2.6.1 managed by Ambari 2.5.1 on AWS Cloud
- Worked on Kerberos Hadoop cluster with 250 nodes cluster.
- Continuous monitoring and managing EMR cluster through AWS Console.
- Kafka- Used for building real-time data pipelines between clusters.
- Configuring schema repository for Oozie & Hive for centralized Azure SQL database.
- Good understanding on Spark Streaming with Kafka for real-time processing.
- Worked with Netezza integration with AZURE data lake.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
- Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
- Configuring, automation and maintaining build and deployment CI/CD tools with high degrees of standardization for both infrastructure and application stack automation in AWS cloud platform.
- Involved in moving all log files generated from various sources to HDFS for further processing.
Environment: Hadoop, Hive, Pig, Tableau, Netezza, Oracle, HDFS, MapReduce, Yarn, Sqoop, Oozie, Zookeeper, Tidal, CheckMK, Graphana, Vertica
Hadoop Admin
Confidential, Chicago, IL
Responsibilities:
- Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
- Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
- Deployed Name Node high availability for major production cluster.
- Installed Kafka cluster with separate nodes for brokers.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Configured Oozie for workflow automation and coordination.
- Troubleshoot production level issues in the cluster and its functionality.
- Backup data on regular basis to a remote cluster using Distcp.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
- Used Apache Kafka for importing real time network log data into HDFS.
- Performed Disk Space management to the users and groups in the cluster.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Documented slides & Presentations on Confluence Page.
- Used Kafka to allow a single cluster to serve as the central data backbone for a large organization.
- Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
- Used Sqoop, Distcp utilities for data copying and for data migration.
- Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
Environment: Kafka, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper,Kerberos,Nagios, MapR
Hadoop Admin
Confidential, Englewood, CO
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Supported Map Reduce Programs those are running on the cluster.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Solved small file problem using Sequence files processing in Map Reduce.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Excellent Knowledge and understanding of Cassandra Architecture.
- Involved in support and monitoring production Linux Systems.
- Actively involved in doing Cassandra Migration to higher version. (from 2.0 to 2.2
- Monitoring Linux daily jobs and monitoring log management system.
- Expertise in troubleshooting and able to work with a team to fix large production issues.
- Expertise in creating and managing DB tables, Index and Views.
- User creation and managing user accounts and permissions on Linux level and DB level.
- Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.
Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper
Hadoop Admin
Confidential
Responsibilities:
- Experience in managing scalable Hadoop cluster environments.
- Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Experience in HDFS maintenance and administration.
- Experience in Name Node HA implementation.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Hands-on experience in Nagios and Ganglia monitoring tools.
- Experience in HDFS data storage and support for running Map Reduce jobs.
- Performing tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
- Installing and configuring Hadoop eco system like Sqoop, Pig, Flume, and Hive.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.
Environment: Cloudera4.2, HDFS, Hive, Pig, Sqoop, HBase, Chef, Rhel, Mahout, Tableau, Micro strategy, Shell Scripting, Red Hat Linux.
Linux Systems Administrator
Confidential
Responsibilities:
- Installation, configuration and troubleshooting of Red hat 4.x, 5.x, Ubuntu 12.x and HP-UX 11.x on various hardware platforms
- Installed and configured JBoss application server on various Linux servers
- Configured Kick-start for RHEL (4, and 5), Jumpstart for Solaris and NIM for AIX to perform image installation through network
- Involved in configuration and troubleshooting of multipathing Solaris and bonding on Linux Servers
- Worked with Red Hat Linux tools like RPM to install packages and patches for Red Hat Linux server
- Developed scripts for automating administration tasks like customizing user environment, and performance monitoring and tuning with nfsstat, netstat, iostat and vmstat
- Performed network and system troubleshooting activities under day to day responsibilities.
- Created Virtual server on VMware ESX/ESXi based host and installed operating system on Guest Servers.
- Installed and configured the RPM packages using the YUM Software manager.
- Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
- Defining and Develop plan for Change, Problem & Incident management Process based on ITIL.
- Extensive use of Logical Volume Manager (LVM), creating Volume Groups, Logical volumes and disk mirroring in HP-UX, AIX and Linux.
- Working with Clusters; adding multiple IP addresses to a Servers via virtual network interface in order to minimize network traffic (load-balancing and failover clusters)
- User and Security administration; addition of User into the System and Password management.
- System Monitoring and log management on UNIX and Linux Servers; including, crash and swap management, with password recovery and performance tuning.
Environment: Red hat, Ubuntu, JBoss, RPM, VMware, ESXi, LVM, DNS, NIS, HP-UX.