We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

0/5 (Submit Your Rating)

San Ramon, CA

PROFESSIONAL SUMMARY:

  • Overall 7 years of experience which includes 4 years of experience in Hadoop Administration and 3+ years of experience in Linux/Unix Systems administration
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Design Big Data solutions for traditional enterprise businesses. Backup configuration and Recovery from a Name Node failure. Coordinated with delivery teams in migration from traditional DW to Hive.
  • Experience in minor and major upgrades of Hadoop and Hadoop eco system. Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network
  • Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Cloudera, Hortonworks and MapR distributions.
  • Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Experience in Installing, configuring and maintaining the file sharing servers like Samba, NFS, FTP and also Web Sphere & Web Logic Application Servers, Nagios and Chef.
  • As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies. Experience in Cassandra on upgrade and cluster maintenance
  • Good Experience in setting up the Linux environments, Password less SSH, Creating file systems, disabling firewalls, swappiness, Selinux and installing Java.
  • Good Experience in Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions
  • Installing and configuring Hadoop eco system like Pig, Hive. Monitoring and helping business analyst’s write/tune hive SQL. Hands on experience in Installing, Configuring and managing the Hue and HCatalog.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
  • Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
  • Experience in importing and exporting the logs using Flume. Optimizing performance of Hbase/Hive/Pig jobs.
  • Hands on experience in Zookeeper and ZKFC in managing and configuring in NameNode failure scenarios
  • Expertise using Talend, Pentaho, Sqoop for ETL operations. Handsome experience in Linux admin activities on RHEL & Cent OS. Experience in deploying Hadoop 2.0(YARN).
  • Set up and configured git repositories for GitHub onLinuxserver/windows desktop and used git commands to initialize/update repositories
  • Good testing practices (unit, integration, system) with automation
  • Expertise in implementing enterprise level security using AD/LDAP, Kerberos, Knox, Sentry and Ranger.
  • Extensive experience in developing the SOA middleware based out of Fuse ESB and Mule ESB.And Configured, Elastic Search Log Stash, Kibana to monitor spring batch jobs.
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability
  • Strong skills in managing Red HatLinuxservers, Virtualization (VMware, Red Hat enterprise virtualization preferred), and system security.
  • Familiar with writing Oozie workflows and Job Controllers for job automation.
  • Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.

TECHNICAL SKILLS:

Technology and Tools: HDFS, Map Reduce, Pig, Hive, Hbase, Sqoop, Zookeeper, Oozie, Hue, HCatalog, Storm, Kafka, Key Value Store Indexer, Flume, ELK, Spark, Devops.Nosql, MySQL, Oracle, SQL Server, Hbase, Cassandra, Cloudera ImpalaHDP Ambari, Cloudera Manager, Hue, SolrCloud.Shell Scripting, Python, Java Scripting, Puppet, Ansible, perl, chefApache Tomcat, JBOSS and Apache Http web serverNet Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS OfficeKerberos, NagiOS & Ganglia, Agile, Scrum environmentsJava, HTML, MVC, Struts, Hibernate, Servlet, Spring, Web servicesWindows XP, 7, 8, UNIX, MAC, MS DOSMS Office, MS Project, MS Visio, MS Visual Studio

PROFESSIONAL EXPERIENCE:

Confidential - San Ramon, CA

Hadoop Administrator

Responsibilities:

  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on MapR and Cloudera clusters
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding/installation of new components and removal of them through Cloudera Manager.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Monitored workload, job performance and capacity planning using Cloudera Manager. Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Working as admin on Cloudera (CDH 5.5.2) distribution for 4 clusters ranges from POC to PROD.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Worked on Cassandra database on a multi-datacenter cluster. Have Experience in setting up Cassandra clusters.
  • Imported logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory loading the data from local system to HDFS.
  • Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis. Fine tuning hive jobs for optimized performance.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks. Worked on NoSQL databases including HBase, Mongo DB, and Cassandra.
  • Implemented multi-data center and multi-rack Cassandra cluster. Configured internode communication between Cassandra nodes and client using SSL encryption.
  • Partitioned and queried the data in Hive for further analysis by the BI team. Extending the functionality of Hive and Pig with custom UDF s and UDAF’s. Involved in extracting the data from various sources into Hadoop HDFS for processing.Creating ETL mappings with Perl,Pythonand UNIX shell scripting.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • DevelopedSparkcode using scala andSpark-SQL/Streaming for faster testing and processing of data.
  • Monitoring SOLR transparency’s and reviews the SOLR servers. Creating and deploying a corresponding SolrCloud collection.
  • Creating collections and configurations, Register a Lily HBase Indexer configuration with the Lily HBase Indexer Service. Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
  • Configuring, Managing permissions for the users in hue. Responsible for building scalable distributed data solutions using Hadoop.
  • Build Data Access Layer usingElasticSearchfor Recommendation view for repair events.
  • Recommended changes to underling core data type mappings inelasticsearch. Used Marvel Sense forElasticSearchQueries
  • Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
  • Involved in loading data from LINUX file system to HDFS. Creating and managing the Cron jobs. Import the data from different sources like HDFS/Hbase intoSparkRDD.
  • Implementation of Ranger, Ranger plug-ins and Knox security tools. Implemented Kerberos Security Authentication protocol for existing cluster.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Researched applications in use that is not able to use TLS and possibly stand up a proxy LDAP server to fulfill this request.
  • Integrated Impala to use the same file and data formats, metadata, security and resource management frameworks.
  • Implemented test scripts to support test driven development and continuous integration.Worked on tuning the performance Pig queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required. Experience in configuring the Storm in loading the data from MYSQL to HBASE using jms
  • Responsible to manage data coming from different sources. Created User defined types to store specialized data structures in Cloudera. Involved in loading data from UNIX file system to HDFS
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team. Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
  • Created Hive external tables for loading the parse data using partitions.
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
  • Developed and Coordinated deployment methodologies (Bash, Puppet & Ansible). Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig andSqoop.

Environment: HDFS, Map Reduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, CDH5, Apache Hadoop 2.6, Spark, SOLR, Storm, Knox, Impala, Red Hat, MySQL and Oracle.

Confidential - Pittsburgh, PA

Hadoop Administrator

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Worked on setting up Hadoop cluster for the Production Environment
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Coded java based ES Queries and custom ES Query aggregations using the Jest API for searching and accessing data inElasticSearch
  • Created the Infrastructure design for PROD, Performance, DevelopmentElastic-searchenvironments
  • Used Puppet for creating scripts, deployment for servers, and managing changes through Puppet master server on its clients.
  • Configured, installed, monitored MapR Hadoop on 10 AWS ec2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.Involved in converting Hive/SQL queries into Spark transformations and actions using Spark SQL (RDDs and Data frames) inPythonand Scala.
  • Installed, configured and deployed a 50 node MapR Hadoop Cluster for Development and Production
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Used Informatica Power Center to create mappings, mapplets, User defined functions, workflows, worklets, sessions and tasks. Worked on installation of DataStax Cassandra cluster.
  • Experience in projects involving movement of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling. Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
  • Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
  • Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP. Installed and managed multiple Hadoop clusters - Production, stage, development.
  • Used Informatica Data Explorer (IDE) to find hidden data problems. Utilized Informatica Data Explorer (IDE) to analyze legacy data for data profiling.
  • Optimized the full text search function by connecting Mongo DB and Elastic Search. Utilized AWS framework for content storage and Elastic Search for document search. Developed a framework for the automation testing on the Elastic Search index Validation. Java, MySQL.
  • Wrote a technical paper and created slideshow outlining the project and showing how Cloudera can be potentially used to improve performance.
  • Setting up monitoring tools for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper.
  • Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide to users to make Hadoop usability simple and updating them for best practices.
  • Write scripts to automate application deployments and configurations. Hadoop cluster performance tuning and monitoring. Troubleshoot and resolve Hadoop cluster related system problems.
  • As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future s.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines. Screen Hadoop cluster job performances and capacity planning.
  • Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
  • Performed automation/configuration management using Chef, Ansible, and Docker based containerized applications.
  • Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Yarn, HBase, MapReduce, Sqoop, Flume, Zookeeper, Hortonworks, Eclipse, MYSQL, Python Shell Scripting.

Confidential

Hadoop Administrator

Responsibilities:

  • Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
  • Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters. Designed the Cluster tests before and after upgrades to validate the cluster status.
  • Involved in coding for J Unit Test cases,ANTfor building the application.Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
  • Documented and prepared run books of systems processes and procedures for future s.
  • Performed Benchmarking and performance tuning on the Hadoop infrastructure.
  • Automated data loading between production and disaster recovery cluster. Migrated hive schema from production cluster to DR cluster.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink. Configured flume agent with flume syslog source to receive the data from syslog servers.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
  • Working on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
  • Implemented authentication and authorization service using Kerberos authentication protocol. Worked on Hadoop CDH upgrade from CDH
  • Led the evaluation of Big Data software like Splunk,Hadoopfor augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
  • Worked on Migrating application by doing POC's from relation database systems. Helping users and teams with incidents related to administration and development.
  • Onboarding and on best practices for new users who are migrated to our clusters. Guide users in development and work with developers closely for preparing a data lake.
  • Migrated data from SQL Server to HBase using Sqoop. Scheduled data pipelines for automation of data ingestion in AWS. Utilized AWS framework for content storage and Elastic Search for document search.
  • Designed table architecture and developed DAO layer using Cassandra NoSQL database.
  • Determined groups using LDAP and migrated them to LDAPS. Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
  • Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper. Extensive knowledge in troubleshooting code related issues. Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library. Auto Populate Hbase tables with data.
  • Designed and coded application components in an agile environment utilizing test driven development approach.

Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, Spark, Splunk, Solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, cluster health, monitoring security, RedHat Linux.

Confidential

Linux/Unix Systems Administrator

Responsibilities:

  • Installed, Configured and Maintained Debian/RedHat Servers at multiple Data Centers. Hands on experience working with production servers at multiple data centers.
  • Involved in writing scripts to migrate consumer data from one production server to another production server over the network with the help of Bash and Perl scripting.
  • Installed and configured monitoring tools Munin and NagiOS for monitoring the network bandwidth and the hard drives status.
  • Configured RedHat Kickstart server for installing multiple production servers.
  • Configuration and administration of DNS, LDAP, NFS, NIS, NIS+ and Send mail on RedHat Linux/Debian Servers. Automated server building using System Imager, PXE, Kickstart and Jumpstart.
  • Planning, documenting and supporting high availability, data replication, business persistence, and fail-over, fail-back using Veritas Cluster Server in Solaris, RedHat Cluster Server in Linux and HP Service Guard in HP environment.
  • Writing, optimizing, and troubleshooting dynamically created SQL within procedures
  • Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
  • Automated tasks using shell scripting for doing diagnostics on failed disk drives. Configured Global File System (GFS) and Zetta byte File System (ZFS).
  • Troubleshooting production servers with IPMI tool to connect over SOL. Configured system imaging tools Clonezilla and System Imager for data center migration.
  • Configured yum repository server for installing packages from a centralized server. Installed Fuse to mount the keys on every Production server for password-less authentication on Debian servers.
  • Installed and configured DCHP server to give IP leases to production servers. Management of RedHat Linux user accounts, groups, directories and file permissions.
  • Implemented the Clustering Topology that meets High Availability and Failover requirement for performance and functionality. Configured, managed ESX VM's with virtual center and VI client.
  • Performance monitoring using SAR, Iostat, VMstat and MPstat on servers and also logged to munin monitoring tool for graphical view. Having Working knowledge of TCP/IP Protocals, SMTP, HTTP, load-balancers.
  • Used LDAP in Active directory to add new user to a directory, remove, modify, and grant privileges and policy.
  • Performed Kernel tuning with the sysctl and installed packages with yum and rpm.
  • Installed and configured PostgresSQL database on RedHat/Debian Servers. Performed Disk management with the help of LVM (Logical Volume Manager).
  • Configuration and Administration of Apache Web Server and SSL. Backup management Recovery through Veritas Net Backup (VNB).
  • Password-less setup and agent-forwarding done for SSH login using ssh-keygen tool. Established and maintained network users, user environment, directories, and security.
  • Documented strongly the steps involved for data migration on production servers and also testing procedures before the migration.
  • Provided 24/7 on call support on Linux Production Servers. Responsible for maintaining security on Red Hat Linux.

Environment: IBM blade servers, Web sphere 5.x/6.x, Apache 1.2/1.3/2.x, Oracle, Logical Volume Manager, VERITAS net backup 5.x/6.0, VM ESX 3.x/2, RHEL 5.x/4.x, Solaris 8/9/10, Sun Fire.

We'd love your feedback!