Hadoop Administrator Resume
Oak Brook, IL
PROFESSIONAL SUMMARY:
- 8 years of total Information Technology experience with expertise in Administration and Operations experience, in Big Data and Cloud Computing Technologies along with Linux Administration.
- Expertise in setting up fully distributed multi node Hadoop clusters, with Apache, Cloudera Hadoop.
- Expertise in AWS services such as EC2, Simple Storage Service (S3), Auto scaling, EBS, Glacier, VPC, ELB, RDS, IAM, Cloud Watch, and Redshift.
- Expertise in MIT Kerberos and High Availability as well as Integration of Hadoop clusters.
- Strong knowledge in installing, configuring and using ecosystem components like Hadoop MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Zookeeper, Kafka, NameNode Recovery, HDFS High Availability Experience in Hadoop Shell commands, verifying managing and reviewing Hadoop Log files.
- Written puppet manifests Provision several pre - prod environments.
- Written puppet modules to automate our build/deployment process and do an overall process improvement to any manual processes.
- Designed, Installed and Implemented / puppet. Good Knowledge in automation by using Puppet
- Experience in EC2, S3, ELB, IAM, Cloudwatch, VPC in AWS
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Extensive experience on performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
- Performed AWS EC2 instance mirroring, WebLogic domain creations and several proprietary middleware Installations.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
- Evaluating performance of EC2 instances their CPU, memory usage and setting up EC2 Security Groups and VPC.
- Configured and Managed Jenkins in various Environments, Linux and Windows.
- Administered Version Control systems GIT, to create daily backups and checkpoint files.
- Created various branches in GIT, merged from development branch to release branch and created tags for releases.
- Experience creating, managing and performing container based deployments using Docker images Containing Middleware and Applications together.
- Enabling/Disabling of Passive and Active check for Hosts and Service in Nagios.
- Good knowledge in installing, configuring & maintaining Chef server and workstation
- Expertise in provisioning clusters and building manifests files in puppet for any services.
- Excellent knowledge in Import/Export structured, un-structured data from various data sources such as RDBMS, Event logs, Message queues into HDFS, using a variety of tools such as Sqoop, Flume etc.
- Expertise in converting non kerberized Hadoop cluster to Hadoop with kerberized cluster
- Administration and Operations experience with Big Data and Cloud Computing Technologies
- Handling in setting up fully distributed multi node Hadoop clusters, with Apache and AWS EC2instances
- Handling in AWS services such as EC2, Simple Storage Service(S3), Auto scaling, EBS, ELB, RDS, IAM, Cloud Watch
- Performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, HCatalog, Phoenix, Falcon, Scoop, Zookeeper,Nifi, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm.
Scripting Languages: Hortonworks and Cloudera, Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, Weblogic Server, WebSphere
Security Reporting Tools: Kerberos Cognos, Hyperion Analyzer, OBIEE & BI+
Automation tools: Elasticsearch-Log stash-Kibana, Puppet, chef, Ansible
WORK EXPERIENCE:
Hadoop Administrator
Confidential, Oak Brook, IL
Responsibilities:
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x
- Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Implement both major and minor version upgrades to the existing cluster and also rolling back to the previous version.
- Created Hive tables to store the processed results in a tabular format.
- Used Oozie workflows to automate jobs on Amazon EMR.
- Utilized cluster co-ordination services through Zookeeper.
- Configuring Sqoop and Exporting/Importing data into HDFS.
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Implemented Proofs of Concept on Hadoop and Spark stack and different big data analytic tools, using Spark SQL as an alternative to Impala
- Used Sqoop to import and export data from RDBMS to HDFS and vice-versa.
- Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized users.
- Spin up EMR clusters with required EC2 Instance types understanding job type and data size.
- Updated cloud formation templates with IAM roles for S3 bucket access, security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags. Worked on bitbucket, git and bamboo to deploy EMR clusters.
- Involved in updating scripts and step actions to install ranger plugins.
- Debug spark job failures and provided workarounds.
- Used Splunk to analyze job logs and ganglia to monitor servers.
- Involved in enabling ssl for hue on prem CDH cluster.
- Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3)
Environment: CDH 5.7.6, Cloudera Manager, Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Red hat/Centos 6.5.
Hadoop Administrator
Confidential, Jersey City, NJ
Responsibilities:
- Experienced in Installing Hortonworks distribution of Hadoop cluster v 2.3.4.0 and 2.4.2.0 on AWS and physical servers.
- Experienced in upgrading HDP version from 2.3.4.0 to 2.4.2.0 and Ambari 2.0 to 2.2.2.0 with latest repositories.
- Involved in multiple POCs to test data and performance on Hadoop versus SAN.
- Working with the business intelligence team to build the Hadoop platform infrastructure architecture.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Understanding all components of the platform infrastructure in order to analyze impact of alerts and other system messages.
- Knowledge in networking to troubleshoot and isolate infrastructure issues.
- Performance tuning of Hadoop clusters and Hadoop routines.
- Screening Hadoop cluster job performances and capacity planning
- Monitoring Hadoop cluster connectivity and security
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Loading large data volumes in a timely manner.
- Develop processes, tools, and documentation in support of production operations including code migration.
- Creating SNMP traps on Hadoop for HP OV monitoring tool to create alerts.
- Configuring Hadoop to use ILM/IDV tool to load the data from source to HDFS.
- Installed and configured Disaster Recovery cluster to replicate the production data.
- Written Bash scripts to backup Namenode metadata, Ambari embedded DB, Hive metastore, Oozie DB etc.
- Sqoop data from Teradata to HDFS, HIVE and vice versa.
- Installed and configured HUE on HDP 2.3.4.0 and 2.4.2.0
- Loaded sensitive data from SAN to HDFS using ILM/IDV.
Environment: s: HDFS, Hive, PIG, UNIX, SQL, Java MapReduce, Hadoop Cluster, HBase, Sqoop, Oozie, Linux, Hortonworks Hadoop Distribution 2,3.x and 2.4.x, Python, MySQL, Teradata, Ambari 2.2.2.0, Grafana
Hadoop Administrator
Confidential, Brooklyn, NY
Responsibilities:
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 90 nodes of PROD cluster and 10 nodes of Dev and UAT clusters. 5 node real-time cluster for real time fraud analysis.
- Provided regular user and application support for highly complex issues involving multiple components such as HDFS, Flume, Kafka, Spark, Hbase and Solr.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Created Kafka topics, provide ACLs to users and setting up rest mirror and mirror maker to transfer the data between two Kafka clusters.
- Experienced in AWS services.
- Experienced in taking back up at S3.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience integration of Kafka with Spark for real time data processing.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Installing and configuring Kafka cluster and monitoring the cluster using splunk.
- Experience in Ansible or related tools for configuration management.
- Retrieved data from HDFS into relational databases with Sqoop.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools, Data science tools like Power BI, Angoss, Tableau, and Cloudera Work Bench.
- Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service.
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
- Configuring, Managing permissions for the users in hue.
- Add user and grant user database access via Ansible script.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: HDFS, Map Reduce, Hive, Hue, Pig, Flume, Oozie, Sqoop, CDH5, Apache Hadoop, Spark, SOLR, Storm, Knox, Zeppelin, Kafka, Hbase, Cloudera Manager, Stream set, Kudu, Red Hat, MySQL and Oracle.
Hadoop Administrator
Confidential, Charlotte, NC
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs for data cleaning.
- Involved in clustering of Hadoop in the network of 70 nodes.
- Experienced in loading data from UNIX local file system to HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on monitoring of VMware virtual environments with ESXi 4 servers and Virtual Center. Automated tasks using shell scripting for doing diagnostics on failed disk drives.
- Configured Global File System (GFS) and Zetta byte File System (ZFS). Troubleshooting production servers with IPMI tool to connect over SOL.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETLtool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Created Hive queries to compare the raw data with EDW tables and performing aggregates.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
Environment: s: Hadoop, HDFS, Map Reduce, Impala, Sqoop, HBase, Hive, Flume, Oozie, Zoo keeper, solr, Performance tuning, cluster health, monitoring security, Shell Scripting, NoSQL/HBase/Cassandra, Cloudera Manager.
Linux System Administrator
Confidential
Responsibilities:
- Administered Red Hat Enterprise Linux 5.x/4.x, OEL 5.x, & Solaris 9 Servers by testing, tuning, upgrading, patching and troubleshooting both Physical & Virtual server problems.
- Used VERITAS File system and VERITAS Volume Manager 5.0 to configure the RAID 1 & RAID 5 Storage System for more redundancy.
- Installed and maintained regular upgrades of Red Hat Linux Servers using kickstart based network installation.
- Created Disk volumes, Volume groups and Logical volumes (LVM) for Linux operating systems.
- Installed and Configured Apache Tomcat Web Server.
- Configured Proxy Server (Squid), DNS, FTP and DHCP servers on Red Hat Enterprise Linux and maintained system securities using IPTABLES.
- Developed Perl & Shell scripts for automation of the build and release process. Developed automation scripting in Python to deploy some applications.
- Created the LDAP Scripts which monitor the LDAP connectivity and alerts the Admin Group if the connection is closed.
- Involved in monitoring and troubleshooting Network like TCP/IP, NFS, DNS and SMTP in Linux servers and System Activities like CPU, Memory, Disk and Swap space usage to avoid any performance issues.
- Experience in implementing and configuring network services such as HTTP, DHCP, and TFTP.
- Install and configure DHCP, DNS (BIND, MS), web (Apache, IIS), mail (SMTP, IMAP, POP3), and file servers on Linux servers.
- Built ESXi hosts using multiple technologies including HPSA, VUM, Host Profiles, and PowerCLI scripts Performed routine maintenance on VMware environment such as vCenter upgrades, firmware upgrades, Patching.
- Troubleshooting Backup and Restore Problems Creation of LVMs on SAN using Linux utilities
- Troubleshooting Linux network, security-related issues, capturing packets using tools such as IP tables, firewall, and TCP wrapper and NMAP.
- By integrating WLST scripts to Shell scripts, artifacts like war, ear are deployed into WebLogic app server.
- Upgraded Red hat Linux OS on Web Sphere, JBoss and Oracle database servers from V3, V4 to V5.
- Monitored servers, switches, ports etc. with Nagios monitoring tool.
- Responsible for setting up Cronjobs scripts on production servers and Implementation of password less (SSH) authentication between servers.
Environment: RedHat Enterprise Linux 5.x/4.x, OEL 5.x, Solaris 9, LVM, RAID, Cronjobs, Oracle, MySQL, TCP/IP