We provide IT Staff Augmentation Services!

Hadoop Cloudera Administrator Resume

Nyc, NY

SUMMARY:

  • 9 Years of experience in IT which including Hadoop Administration, Windows VMWare and Linux Administration in areas of Financial, Insurance Industries, Client - Server, Internet Technologies, SOA application Integration.
  • Deploying a Hadoop cluster, maintaining a Hadoop cluster, adding and removing nodes using monitoring tools like Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running Hadoop jobs.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job. Experience Schedule Recurring Hadoop Jobs with Apache Oozie.
  • Worked closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access. Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
  • Strong experience in writing shell scripts to automate the administrative tasks and automate the WebSphere Environment with Perl and Python Scripts.
  • Starting, stopping and restarting the Cloudera manager servers whenever there are any changes or any errors.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • Familiarity with a NoSQL database such as MongoDB.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and installing Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume and Zookeeper.
  • Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying of scheduled jobs such as backups.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
  • Experienced on supporting Production clusters troubleshooting issues within window to avoid any delays.
  • Good understanding and hands on experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
  • Good hands on experience on LINUX Administration and troubleshooting issues related to Network and OS level.
  • Assist developers with troubleshooting Map Reduce, BI jobs as required.
  • Good Working Knowledge on Linux concepts and building servers ready for Hadoop Cluster setup.
  • Extensive experience on monitoring servers with Monitoring tools like Nagios, Ganglia about Hadoop services and OS level Disk/memory/CPU utilizations.
  • Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
  • Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
  • Experience in Migrating the On-Premise Data Center to AWS Cloud Infrastructure.
  • Experience in AWS CloudFront including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Good working knowledge of Vertica DB architecture, column orientation and High Availability.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Experience in installing, configuring Hive, its services and Metastore. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
  • Hadoop Ecosystem Cloudera, Hortonworks, Hadoop, MapR, HDFS, HBase, Yarn, Zookeeper, Nagios, Hive, Pig, and AmbariSpark Impala.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.

TECHNICAL SKILLS:

Big Data Tools: HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Kafka, Spark, Hortonworks, Ambari.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios, Nifi.

Operating Systems: UNIX, Linux, WindowsVista, 2003/ XP, Windows 7, Windows 8

Servers: Web logic server, WebSphere and Jboss.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

Database: MySQL/SQL/NoSQL/PL/SQL, Teradata, HBase, MongoDB, Cassandra RDBMS.

QA methodologies: Waterfall, Agile, V-model, Data Modeling Star-Schema Modeling, Snowflakes Modeling, Erwin 4.0, Visio.

Scripting: Programming UNIX Shell Scripting, Korn Shell, SQL*Plus, PL/SQL,HTML.

WORK EXPERIENCE:

Hadoop Cloudera Administrator

Confidential, NYC, NY

Responsibilities:

  • Hadoop installation, Configuration of multiple nodes using Cloudera platform.
  • Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line. Cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
  • Handling the installation and configuration of a Hadoop cluster.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
  • Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
  • Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
  • Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
  • Close monitoring and analysis of the MapReduce job executions on cluster at task level.
  • Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
  • Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
  • Currently working as hadoop administrator in MapR hadoop distribution for 5 clusters ranges from POC clusters to PROD clusters contains more than 1000 nodes.
  • Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
  • Commissioning and De-commissioning of data nodes from cluster in case of problems.
  • Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
  • Set up and managing HA Name Node to avoid single point of failures in large clusters.
  • Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
  • Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions Extensive Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.
  • Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Implemented several scheduled Spark, Hive & Map Reduce jobs in Hadoop MapR distribution.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.

Environment: Linux, Shell Scripting, Teradata, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.

Confidential

Confidential, Sunnyvale, CA

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Operating system and Hadoop Cluster monitoring using tools like Nagios, Ambari.
  • Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka.
  • Involved in 24/7 Production support of Hadoop system in scheduling jobs using Oozie and control-m for the auto processing of similar data and performed operating systems updates/patches as and when required and configured storage & partitions for the systems using logical volume manager.
  • Good understanding on Spark Streaming with Kafka for real-time processing
  • Installed multi cluster nodes on HDP platform with the help of Ambari and responsible for mounting, un-mounting file systems based on the requirements.
  • Data was Ingested which is received from various database providers using Sqoop onto HDFS for analysis and data processing.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Ingested the data from various file system to HDFS using UNIX command line utilities and converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Implemented and administered Data center Cassandra Clusters based on the knowledge of architecture and data modeling for Cassandra applications.
  • Performance tuning for kafka cluster (failure and success metrics).
  • Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
  • Used Storm and Kafka Services to push data to HBase and Hive tables.
  • Installed Kafka cluster with separate nodes for brokers.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Responsible for day-to-day activities which includes HDFS support and maintenance, Cluster maintenance, commissioning/decommissioning of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup and restoring, capacity planning.
  • Worked with Hadoop developers and operating system admins in designing scalable supportable infrastructure for Hadoop.
  • Worked on Kafka cluster by using Mirror Maker to copy to the Kafka cluster on Azure.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
  • Involved in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
  • Experience in Encryption at rest using Key Trustee Server, Enabling encryption over the wire (TLS) and also implemented Kafka Security Features using TLS.
  • Experience with installing and configuring Distributed Messaging System like Kafka.
  • Experience in implementing Hadoop ACLs and RBAC using sentry and implemented Kerberos Security Authentication protocol for existing cluster.
  • Hands on experience working on Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Zookeeper, Oozie, Hive, Sqoop, Pig, Flume.

Environment: HDFS, HBase, Sqoop, Kerberos, cluster health, RedHatLinux, Impala, Cloudera Manager,Puppet, Ambari, Nifi, Cassandra, Ganglia and Cloudera, Agile/scrum.

Hadoop Cloudera Administrator

Confidential, Coppell, TX

Responsibilities:

  • Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
  • Planning, Installing and Configuring Hadoop Cluster in Cloudera Distributions.
  • Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), HBase. Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Deployed a Hadoop cluster using CDH4 integrated with Nagios and Ganglia.
  • Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with CDH4.
  • Good knowledge in adding security to the cluster using Kerberos and Sentry. Secure Hadoop clusters and CDH applications for user authentication and authorization using Kerberos deployment.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
  • Knowledge on bootstrapping, removing, replicating the nodes in Cassandra and Solr clusters.
  • Implemented IMPALA for data processing on top of HIVE. Worked with NoSQL database HBase to create tables and store data.
  • Maintained the architecture Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet and HDP 2.2.7.
  • Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
  • Experience in managing the Hadoop MapR infrastructure with MCS. Decommissioning and commissioning new DataNode on current Hadoop cluster.
  • Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
  • Works on architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Horton works &Cloudera Hadoop Distribution.
  • Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and ClouderaCDH4.

Environment: Hadoop, Cloudera, Spark, Hive, HBase, SQL, Nifi, Kafka, Spark and Sqoop, Linux, HDFS, MapReduce, MapR, GitHub, MySQL, Hortonworks, NoSQL, MongoDB, Shell Script, Python.

Hadoop Administrator

Confidential, Bordentown, NJ

Responsibilities:

  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Loaded data into cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
  • Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Ansible.
  • Maintained the architecture Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet and HDP 2.2.7.
  • Knowledge of configuration management and automation tools like Ansible for non-trivial installation.
  • Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
  • Maintained the architecture Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet and HDP.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop.
  • Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Changing the configurations based on the requirements of the users for the better performance of the jobs
  • Worked on evaluating, architecting, installation/setup of Hortonworks 2.1/1.8 Big Data ecosystem which includes Hadoop, Pig, Hive, Sqoop etc.
  • Experience in AWS Cloud Front, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.

Environment: Hive, Pig, HBase, Sqoop, Python, Ambari 2.0, Hortonworks, Cent OS, HBase, MongoDB, Cassandra, SPARK, Puppet.

Hadoop Administrator

Confidential, Atlanta, GA

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
  • Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database HBase to create tables and store data.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
  • Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job
  • Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python.
  • Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka.
  • Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn
  • Hands on experience in installing, configuring MapR, Hortonworks clusters and installed Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Developed Pig UDFs to pre-process data for analysis. Worked with business teams and created Hive queries for adhoc access.
  • Experienced with Hadoop ecosystems such as Hive, HBase, Sqoop, Kafka, Oozie etc.
  • Responsible for creating Hive tables, partitions, loading data and writing hive queries.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume. Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.

Environment: Hadoop, Hive, AWS, Flume, HDFS, Spark, Kafka, Hive, Sqoop, Oozie, Hortonworks, kafka,Oracle10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Cassandra.

Linux Admin

Confidential

Responsibilities:

  • Worked as System administration, maintenance and monitoring various day-to-day operations.
  • Well-Trained and worked Primarily on RHEL 5.x Operating Systems.
  • Experienced in Installation of Linux operating systems applying Read, Write, and Execute file permission and on File system issues and Disk management.
  • Worked through Creating, Managing and modifying the user accounts, groups and access levels on Linux. Worked on package management using RPM and YUM.
  • Expertise in writing Bash Scripts, Pearl Scripts (hash and arrays), Python programming for deployment of Java applications on bare servers or Middleware tools.
  • Provided technical support by troubleshooting issues with various Servers on different platforms.
  • Notify server owner if there was a failover or crash. Also notify Unix Linux Server Support L3.
  • Monitored CPU loads, restart processes, check for file systems.
  • Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
  • Worked on Planning, configuring storage using LVM and applying patches on Linux machines.
  • Experienced on creating volume groups and Logical volumes on Linux.
  • Worked on Installation and Configuration of SAMBA server, DNS server APACHE server.
  • Worked on using tar command for Data Compressing, Backup and recovery.
  • Experienced in developing scripts in PERL and SHELL to automate the process, like Preparation of operational testing scripts for Log check, Backup and recovery and Failover.
  • Monitored server and application performance and tuned I/O, memory and Installation of SSH and configuring of keys base authentication.

Environment: Linux, RHEL 5.x, SAMBA server, DNS server APACHE server.

Windows VMware Admin

Confidential

Responsibilities:

  • Installed and configured IBM HTTP Server and Apache Web Server to transfer HTTP requests to the WebSphere Portal Server through the WebSphere Application Server plug-in in a clustered environment.
  • Administer the WebSphere development, TEST, Stage and production environments, deploy WebSphere applications to DEV and TEST environments.
  • Configured and Enabled the Global Security for WAS Administration Console users and Console Groups using Active Directory Server as a LDAP User Registry.
  • Very strong experience in complete life cycle of WebSphere Application Server Administration like involving Architecture/Design discussions, installation ND and Base versions and fix packs, Configuration, Deployment, Scripting, Migration, and Troubleshooting on Solaris, Aix, Linux, Windows 2003/2008 Server Environments.
  • Responsible for Installing RAD7.5 for development teams and managing patch levels, building out environments, configurations assisting developers.
  • Lead for WebSphere Portal 6.1.5/6.1.0.3 migrations on AIX environment and WebSphere Content Rules Engine setup on Windows 2003 server environment.
  • Installing Deployment Manager, Primary portal node, Secondary portal node and fix packs
  • Database Migration from Cloudscape to Oracle for CP and FS instances and to DB2 for Personalization Rules Engine instance.
  • Clustered the Portal Server environment using horizontal clustering across multiple boxes to facilitate high availability, failover support and load Balancing in a production environment
  • Provided 24x7 on-call supports in debugging and fixing issues related to Linux, Solaris, HP-UX Installation/Maintenance of Hardware/Software in Production, Development & Test Environment as an integral part of the Unix/Linux (RHEL/SUSE) Support team.

Environment: WebSphere 7.0/6.1.0.13, IHS 6.1/7.0, Shell Scripts, Admin scripting J2EE 1.3/1.4/1.5/1.6, IBM HttpServer6.0/2.0.47, WebSphere Portal Server 7.0/6.1.0.3, WebSphere MQ7, XML, Windows 2008 Server.

Hire Now