We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

5.00/5 (Submit Your Rating)

Nyc, NY

SUMMARY

  • Overall 8+ years of experience in Software analysis, design, development and maintenance in diversified areas of Client - Server, Distributed and embeddedapplications.
  • Cloudera Certified Administrator for Apache hadoop. CCA-500.
  • Hands on experiences with Hadoop stack. (HDFS, MapReduce, YARN, Sqoop, Flume, Hive-Beeline,Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, Centrify DC, Falcon, Hue, Kafka, Storm).
  • Experience with Cloudera Hadoop Clusters with CDH 5.6.0 with CM 5.7.0.
  • Experienced on Hortonworks Hadoop Clusters with HDP 2.4 with Ambari 2.2.
  • Hands on day-to-day operation of the environment, knowledge and deployment experience in Hadoop ecosystem.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml andhadoop-env.xml based upon the job requirement.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Experience in installing, configuring and optimizing ClouderaHadoopversion CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
  • Commissioning and de-commissioning the cluster nodes, Data migration. Also, Involved in setting up DR cluster with BDR replication setup and Implemented Wire encryption for Data at REST.
  • Implemented Security TLS 3 over on all CDH services along with Cloudera Manager.
  • Data Guise Analytics implementation over secured cluster.
  • Blue-Talend integration and Green Plum migration has been successfully implemented.
  • Ability to plan, manage HDFS storage capacity and disk utilization.
  • Assist developers with troubleshooting MapReduce, BI jobs as required.
  • Provide granular ACLsfor local file datasets as well as HDFS URIs. Role level ACL Maintenance.
  • Cluster monitoring and troubleshooting using tools such as Cloudera, Ganglia, NagiOS, and Ambari metrics.
  • Manage and review HDFS data backups and restores on Production cluster.
  • Implement new Hadoop infrastructure, OS integration and application installation. Install OS (rhel6, rhel5, centos, and Ubuntu) and Hadoop updates, patches, version upgrades as required.
  • Implemented Multi-nodeHadoopClusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Implement and maintain security LDAP, Kerberos as designed for cluster.
  • Expert in setting up Hortonworks (HDP2.4) cluster with and without using Ambari2.2
  • Experienced in setting up Cloudera (CDH5.6) cluster using packages as well as parcels Cloudera manager 5.7.0.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, JobTracker, Task Tracker, NameNode, DataNode and MapReduce concepts.
  • Solid understanding of all phases of development using multiple methodologies i.e. Agile with JIRA, Kanban board along with ticketing toolRemedy and Servicenow.
  • Expertise with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB
  • Expertise to handle tasks in Red HatLinux includes upgrading RPMS using YUM, kernel, configure SAN Disks, Multipath and LVM file system.
  • Creating and maintaining user accounts, profiles, security, rights, disk space and process monitoring.Handling and generating tickets via the BMCRemedy ticketing tool.
  • ConfigureUDP, TLS, SSL, HTTPD, HTTPS, FTP, SFTP, SMTP, SSH, Kickstart,Chef, Puppet and PDSH.
  • Overall Strong experience in system Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance monitoring and Fine-tuning on Linux (RHEL) systems.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Map Reduce, YARN, PIG, Hive, Hbase, Zookeeper, Oozie, Ambari, Kerberos, Knox, Ranger, Sentry, Spark, Tez, sentryImpala, Hue, Storm, Kafka, Flume, Sqoop, Solr.

Hardware: IBM pSeries,pureflex,RS/6000,IBM Blade servers, HP Proliant DL 360,380, HP Blade servers C6000,C7000.

Operating Systems: Linux, AIX, CentOS,Redhat, Solaris & Windows .

Networking: DNS, DHCP, NFS, FTP, NIS, LDAP, Open SSH, Apache, NFS, NIM

Tools: & Utilities: servicenow,Jira,Remedy,Maximo,Nagios

Databases: Oracle 10/11g, 12c, DB2, MySQL, HBase, Cassandra, MongoDB.Teradata.

Virtualization: VMware vSphereCluster Technologies: Cloudera, Hortnworks

Cloud Knowledge: Openstack, AWS,azura,digitalocean.

Scripting & Programming Languages: Shell & Perl programming, Java,Python.

PROFESSIONAL EXPERIENCE

Confidential, NYC, NY

Sr. Hadoop Administrator

Responsibilities:

  • Understood the existing Enterprise data warehouse set up and provided design and architecture suggestion converting toHadoop ecosystem.
  • Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, YARN, Zookeeper, Hbase, Hive, MapReduce, Pig, Kafka, Storm and Spark in Linux servers using Ambari.
  • Set up automated 24x7x365 monitoring and escalation infrastructure for Hadoop cluster using Nagios Core and Ambari.
  • Designed and implemented Disaster Recovery Plan forHadoopClusters.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • IntegratedHadoop clusterwith Active Directory and enabled Kerberos for Authentication.
  • Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the MapReduce jobs given by the users.
  • Set up Linux Users, and tested HDFS, Hive, Pig and MapReduce Access for the new users.
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
  • OptimizedHadoopclusters components: HDFS, Yarn, Hive, Kafka to achieve high performance.
  • Worked with Linux server admin team in administering the server Hardware and operating system.
  • Interacted with Networking team to improve bandwidth.
  • Provided User, Platform and Application support onHadoopInfrastructure.
  • Applied Patches and Bug Fixes onHadoopCluster.
  • Using Optimized Row Columnar(ORC) file format provided a highly efficient way to store Hive data
  • Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
  • Conducted Root Cause Analysis and resolved production problems and data issues.
  • Installed and configured R-Hadoop to run the R jobs.
  • Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
  • Performed Backup and Recovery process inorder to Upgrade Hadoop stack.
  • Used Sqoop, Distcp utilities for data copying and for data migration.
  • End to end Data flow management from sources to Nosql (mongoDB) Database using Oozie, NIFI.
  • Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs such as MapReduce, Pig, Hive, and Sqoop as well as system specific jobs such as Java programs and Shell scripts.
  • Installed Kafka cluster with separate nodes for brokers.
  • Upgraded the POC cluster from HDP 2.4.2 to HDP 2.5.0.
  • Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
  • Monitored cluster stability, used tools to gather statistics and improved performance.
  • Used Apache(TM)Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Identified disk space bottlenecks and Installed Nagios Log Server and integrated it with the PRD cluster to aggregate service logs from multiple nodes and created dashboards for important service logs for better analyzation based on historical log data.

Environment: Hue, Oozie, Eclipse, HBase, Flume, Splunkd, Linux, Java Hibernate, Java jdk, Kickstart, Puppet PDSH, chef,gcc4.2, git, Cassandra, NoSql,RedHat, CDH(4.x), Flume, Impala, MySQL, mongoDB, Nagios, Chef.

Confidential, Jackson, Michigan

Hadoop Administrator

Responsibilities:

  • Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Hortonworks.
  • Adding/Installation of new components and removal of them through Cloudera.
  • Monitoring workload, job performance, capacity planning using Cloudera.
  • Major and Minor upgrades and patch updates.
  • Implement the rack-awarenesss.sh shell scripts in racktopology.
  • Installed Hadoop eco system components like Pig, Hive, Hbase and Sqoop in a CLuster.
  • Experience in setting up tools like Ganglia for monitoring Hadoop cluster.
  • Handling the data movement between HDFS and different web sources using Flume and Sqoop.
  • Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Installed and configured HA of Hue to point Hadoop Cluster in cloudera Manager.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
  • Installed and configured MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Extensively worked on Informatica tool to extract data from flat files, Oracle and Teradata and to load the data into the target database.
  • Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
  • Set up and managing HA Name Node to avoid single point of failures in large clusters.
  • Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
  • Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.

Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase.

Confidential, Denver CO

Hadoop Administrator

Responsibilities:

  • Cloudera distribution with cluster size ~150 Nodes Data: 300 TB
  • Involved in improving the scaled data solutions using Hadoop.
  • Involved indetermining the correct distribution and infrastructure for the cluster right from POC activity till setting up Production environments along with other test environments.
  • Configured Cloudera cluster using automation Cloudera manager setup and configured all Hadoop development components for running set of SQL, PIG and JAVA queries.
  • Assist in Install and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster with latest patches.
  • Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
  • Provide guidance over simple to complex Map/reduce Jobs using Hive and Pig
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer medical behaviour.
  • Deploy UDF’s developed by developers to implement business logic in Hadoop
  • Monitor services and network behaviour setup using NagiOS.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as and when required.

Environment: MapReduce, HDFS, Hive, Pig, Hue, Oozie, Eclipse, HBase, Flume Linux, Java gcc4.2, GIT.

Confidential

Hadoop Administrator

Responsibilities:

  • Involved in architectural design cluster infrastructure, Resource mobilization, Risk analysis and reporting.
  • Commissioning and de-commissioning the data nodes and involve in NameNode maintenance.
  • Install security using Kerberos on cluster for AAA (authentication, authorization and auditing).
  • Regular backup and clear logs from HDFS space. This is to utilize data nodes optimally. Write shell scripts for time bound commands execution.
  • Edit and configure HDFS and tracker parameters.
  • Script the requirements using BigSQL and provide time statistics of running jobs.
  • Involve code review tasks in simple to complex Map/reduce Jobs using Hive and Pig
  • Cluster Monitoring using Big Insights ionosphere tool.
  • Importing of data from various data sources, parse into structured data region wise and date wise. Analysed the data by performing Hive queries and running Pig scripts to study customer behaviour.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Environment: Linux, Hadoop, Big Insights, Hive, puppet, Java, C++.

Confidential

Sr. Linux Administrator

Responsibilities:

  • Provided system support for 100+ servers of Red hat Linux including routine maintenance, patching, and system backups and restores, and software and hardware upgrades.
  • Worked on building new SUSE & Redhat Linux servers, support lease replacements and implementing system patches using the HP Server Automation tool.
  • Configuration, implementation and administration of Clustered servers on SUSE Linux environment.
  • System administration, System planning, co-ordination and group level and user level management.
  • Create and manage Logical Volumes in Linux.
  • Experience on backup and recovery software like Net-backup on Linux environment.
  • Setting up NagiOS Monitoring software.
  • Supported production systems 24 x 7 on a rotational basis.
  • Resolved Security Access Requests via Peregrine Service center to provide the requested User access related requests.
  • Handling and generating tickets via the BMC Remedy ticketing tool.
  • Performance Monitoring and Performance Tuning using Top, prstat, sar, vmstat, netstat, jps, iostat etc.
  • Creating new file system, managing & checking data consistency of the file system.
  • Successfully Migrated virtual machines from legacy Virtual environment VMware Vsphere 4.1 to new VMware Vsphere 5.1.
  • Documented the procedure of obsolete servers resulting in considerable reduction in time and mistakes for this process as well streamlining the existing process.
  • Communicated and Coordinated with customers internal/external for resolving issues for reducing downtime.
  • Disaster Recovery and Planning.
  • Problem determination, Security, Shell Scripting.

Environment: REDHAT Linux 5, 6, HP Gen 8 Blades and Rack mount Servers, Oracle Virtual Box.

Confidential

Linux/Database Administrator

Responsibilities:

  • Installing and maintaining the Linux servers
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers
  • Monitoring System Metrics and logs for any problems.
  • Adding, removing, or updating user account information, resetting passwords, etc
  • Creating and managing Logical volumes. Using Java JDBC to load data into MySQL
  • Maintaining the MySQL server and Authentication to required users for databases
  • Installing and updating packages using YUM
  • Patches installation and updating on server.
  • Virtualization on RHEL server (Through Xen & KVM Server)
  • Resize LVM disk volumes as needed. Administration of VMware virtual Linux serve
  • Installation and configuration of Linux for new build environment.
  • Created Virtual server on Citrix Xen Server based host and installed operating system on Guest Servers.
  • Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitoring the System activity, Performance, Resource utilization.
  • Configuring NFS, DNS.
  • Updating YUM Repository and Red hat Package Manager (RPM).

Environment: Linux, Centos, Ubuntu, FTP, NTP, MYSQL.

We'd love your feedback!