We provide IT Staff Augmentation Services!

Cloudera Administrator Resume

4.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • Around 8 years of experience in IT with over around 6 years of hands - on experience as Hadoop Administrator.
  • Hands-on with Cloudera installation, configuration and worked with the team to Support and to create backup recovery management.
  • Hands on experience in deploying Cloudra automation and managing multi-node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Horton works Ambari.
  • Hands on experience in installing, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
  • Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
  • Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
  • Supported technical team for automation, installation and configuration tasks.
  • Designed and implemented database software migration procedures, and guidelines.
  • Performed administrative tasks on Hadoop Clusters using Cloudera/HortonWorks.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Apache NiFi, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Implemented DB2/LUW replication, federation, and partitioning (DPF).
  • Areas of expertise and accomplishment include: Database Installation/Upgrade, Backup/Recovery,
  • Hands on experience in installing, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
  • Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services (AWS) using an EC2 instances.
  • Installed and configured a Hortonworks HDP 2.3.0 using AMBARI 2.1.1 manager.
  • Having strong experience/expertise in different Data warehouse tools including ETL tools like Ab Initio, Informatica, etc. and BI tools like Cognos, Micro strategy, Tableau and Relational Database systems like Oracle/PL/SQL, Unix Shell scripting and Experience working on AWS EMR Instances.
  • Good knowledge and experience in tuning the performance of Hadoop Clusters
  • Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using Zookeeper and Quorum Journal Nodes.
  • Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
  • Familiar with writing Oozie workflows and Job Controllers for job automation.
  • Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem.
  • Importing data from various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING for data analysis.
  • Analysed the client's existing Hadoop infrastructure to understand the performance bottlenecks and provided performance tuning accordingly.
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
  • Extensively involved in Test Plan re-design, Test Case re-Creation, Test Automation and Test Execution of web and client server applications as per change requests.
  • Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop 2.2, HDFS, MapReduce, Hive, Pig, Cloudera, Sqoop, Oozie, Yarn, Apache NiFi, Spark, Kafka, Storm, Ambari 1.0-2.1.1, Flume and mahout.

Hadoop Management & Security: Hortonworks, Cloudera Manager.

Web Technologies: HTML, XHTML, XML, XSL, CSS, JavaScript

Server Side Scripting: Shell, Perl, Python.

Database: Oracle 10g, Microsoft SQL Server, MySQL, DB2, SQL, RDBMS.

Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBM, WebSphere 6.0/ 5.1.1

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python

NO SQL Databases: HBase, Mongo DB

Virtualization: VMware, ESXI, VSphere, VCenter Server.

SDLC Methodology: Agile (SCRUM), Waterfall.

Operating Systems: Windows 2000 Server, Windows 2000 Advanced Server, Windows Server 2003 Centos, Debian, Fedora, Windows NT, Windows 98/XP UNIX, Linux RHEL, DB2

PROFESSIONAL EXPERIENCE

Cloudera Administrator

Confidential, Boston, MA

Responsibilities:

  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Configuring, Maintaining, and Monitoring Hadoop Cluster using Cloudera Manager (CDH5) distribution.
  • Responsible for Cluster configuration maintenance and troubleshooting and tuning the cluster.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked on Apache NiFi to Complete end to end design and development flow which acts as the agent between middleware team and EBI team.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Installed and configured Hadoop, MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and Up gradation Cloudera from 5.5 to 6.0 version.
  • Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
  • Integrating Hadoop cluster with Kerberos authentication for secured authentication & authorization of Hadoop cluster and monitored the connectivity.
  • Built & Deployed Hadoop clusters with different Hadoop components (HDFS, YARN, HBASE and ZOOKEEPER).
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Point of Contact for Vendor escalation Cloudera Manager Up gradation from 5.3 to 5.5 versions.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Deployed Puppet, Puppet Dashboard, and Puppet DB for configuration management to existing infrastructure.
  • Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
  • Expertise in Capacity Planning, Configuration, and Operational zing of small to medium sized BIGDATA Hadoop Clusters.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Designed the cluster so that only one secondary name node daemon could be run at any given time.
  • Rack Aware Configuration, Configuring Client Machines Configuring, Monitoring and Management Tools
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Worked on NoSQL databases including HBase, Mongo DB, and Cassandra. Implemented multi-data center and multi-rack Cassandra cluster.
  • Having knowledge on Installation and configuration of Cloudera Hadoop on single or cluster environment.
  • Partitioned and queried the data in Hive for further analysis by the BI team. Extending the functionality of Hive and Pig with custom UDF s and UDAF's.

Environment: Cloudera 4.3.2, HDFS, Cloudera Impala, Hive, Sqoop, Zookeeper and HBase, Windows 2000/2003, Unix Linux Java, Pig Hive HBase Flume Sqoop, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Big Data Cloud era CDH Apache Hadoop, Toad, MYSQL plus, Oracle Enterprise Manager (OEM), RMAN, Shell Scripting, RedHat/Suse Linux.

Cloudera/Hadoop Administrator

Confidential, Irving, TX

Responsibilities:

  • Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Involved in capacity planning, with reference to the growing data size and the existing cluster size.
  • Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL, and databases, Flume, Oozie and Sqoop.
  • Experience in designing, implementing and maintaining of high performing Bigdata, Hadoop clusters and integrating them with existing infrastructure.
  • Used NoSQL database with Cassandra, MongoDB, Monod and Designed table architecture and developed DAO layer.
  • Deployed the application and tested on Websphere Application Servers.
  • I was also involved in core components of HDP like Yarn and HDFS, which I was using to get architect platform.
  • Configured SSL for Ambari, Ranger, Hive and Knox.
  • Experience in methodologies such as Agile, Scrum, and Test driven development.
  • Creating principles for new users in the Kerberos and Implemented and maintained Kerberos cluster and integrated with the Active Directories (AD).
  • Developed a data pipeline using Kafka and Storm to store data into Hdfs.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Involved in migrating java test framework to python flask.
  • Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
  • Monitoring and analysing MapReduce jobs and look out for any potential issues and address them.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Moving the data from Oracle, Teradata, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
  • Good knowledge in implementing Name Node Federation and High Availability of Name Node and Hadoop Cluster using Zookeeper and Quorum-Journal Manager.
  • Good knowledge in adding security to the cluster using Kerberos and Sentry.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Hands-On experience in setting up ACL (Access Control Lists) to secure access to the HDFS file system.
  • Analyze escalated incidences within the Azure SQL database.
  • Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
  • Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the TaskTrackers.
  • Experience in tuning the heap size to avoid any disk spills and to avoid OOM issues.
  • Familiar with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
  • Experience managing users and permissions on the cluster, using different authentication methods.
  • Involved in regular Hadoop Cluster maintenance such as updating system packages.
  • Experience in managing and analysing Hadoop log files to look troubleshooting issues.
  • Good knowledge in NoSQL databases, like HBase, MongoDB, etc.
  • Working on Hadoop Hortonworks distribution which managed services viz. HDFS, MapReduce2
  • Installed and configured CDH5.0.0 cluster, using Cloudera manager.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Installation and configuration Hortonworks distribution HDP 1.3.2 and Cloudera CDH4.
  • Developed scripts for benchmarking with Terasort / Teragen.
  • Worked on commission and decommission of data node.
  • Ranger and Knox set up over all clusters.
  • Upgraded Hortonworks distribution HDP 1.3.2 to HDP 2.2
  • Provide the user access to team for AWS console and ssh connectivity to the Edgenode
  • Monitoring and managing the AWS infrastructure through Cloud watch and Nagios.
  • Experience working in AWS Cloud Environment like EC2.
  • Setup and Install Hadoop (With YARN / Map Reduce) cluster and Enterprise Data Ware House.
  • Build High-Availability (HA) architectures and deployed with Big Data Technologies.
  • Setup Platform build on AWS and implemented Kerberos, LDAP security.
  • Created AWS data pipeline for ingesting, transforming, and load the data from S3 bucket to EC2 with SNS, IAM, CloudeWatch services.
  • Cassendra database was use to transform queries to Hadoop HDFS.
  • Installed and Setup Hadoop clusters for development and production environment using Cloudera CDH3, CDH4, Apache Tomcat & Hortonworks Ambari on, Python, Redhat,& Windows.
  • Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization.
  • Managing and reviewing Hadoop log files and debugging failed jobs.
  • Tuned the cluster by Commissioning and decommissioning the Data Nodes.
  • Supported cluster maintenance, Backup and recovery for production cluster.
  • Backed up data on regular basis to a remote cluster using distcp
  • Knowledge on supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
  • Worked on data processing on AWS EC2 Cluster and Fine tuning of Hive jobs for better performance.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Collected and aggregated large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments. Configured Hadoop, Hive and Pig on Amazon EC2 servers.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
  • Configured MySQL Database to store Hive metadata.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Worked with Linux systems and MySQL database on a regular basis.
  • Supported Map Reduce Programs those ran on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Monitoring cluster job performance and involved capacity planning.
  • Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in creating hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Involved in developing shell scripts and automated data management from end to end integration work.
  • Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
  • Developed Map Reduce program for parsing and loading into HDFS information.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
  • Used OOZIE workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
  • Using HBase to store majority of data which needs to be divided based on region.
  • As a programmer, involved in designing and implementation of MVC pattern.
  • Extensively used XML where in process details are stored in the database and used the stored XML whenever needed.
  • Part of core team to develop process engine.
  • Developed Action Classes & Validation Struts framework.
  • Created project related documentations like user guides based on role.
  • Implemented modules like Client Management, Vendor Management.
  • Implemented Access Control Mechanism to provide various access levels to the user.
  • Designed and developed the application using J2EE, JSP, Struts, Hibernate, Spring technologies.
  • Coded DAO and hibernate implementation Class for data access.
  • Coded Springs Services Class and Transfer Objects to pass the data between layers.
  • Implemented Web Services using Axis
  • Used different features of Struts like MVC, Validation framework and tag library.
  • Created detail design document, Use cases, and Class Diagrams using UML
  • Written ANT scripts to build JAR, WAR and EAR files.
  • Developed Standalone Java Component that will interact with Crystal Reports on Crystal Enterprise Server in order to view Reports as well Scheduling of Reports as well storing data as XML and sending data to consumers using SOAP.
  • Deployed the application and tested on WebSphere Application Servers.
  • Developed JavaScript for client side validations in JSP.
  • Coordinated with the onsite, offshore and QA team to facilitate the quality delivery from offshore on schedule.

We'd love your feedback!