We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

Englewood, NJ

PROFESSIONAL SUMMARY:

  • Around 7 years of IT experience, including with 4 years of experience as a Hadoop Administration and along with around 2+ Years of experience in Linux admin related roles.
  • As a Hadoop Administration responsibility include software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks & Cloudera)
  • Experience in installation, management and monitoring of Hadoop cluster using Apache, Cloudera Manager.
  • Optimized the configurations of Map Reduce, pig and hive jobs for better performance.
  • Advanced understanding in Hadoop Architecture such as HDFS, Yarn.
  • Strong experience configuring Hadoop Ecosystem tools with including Pig, Hive, Hbase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark and Storm.
  • Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka.
  • Experience in designing, installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks and Cloudera.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra and HBase
  • Expert - level understanding of the AWS cloud computing platform and related services.
  • Experience in managing the Hadoop infrastructure with Cloudera Manager and Ambari.
  • Working experience on Importing and exporting data into HDFS and Hive using Sqoop
  • Working experience on Import & Export of data using ETL tool Sqoop from MySQL to HDFS
  • Strong Knowledge in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
  • Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
  • Experience in Backup configuration and Recovery from a Name Node failure.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Involved in Cluster maintenance, bug fixing, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
  • Worked with business users to extract clear requirements to create business value
  • Exceptionally well organized that demonstrates self-motivation, learning, creativity & initiatives, extremely dedicated & possess skills in actively learning new technologies within short span of time
  • Management of security in Hadoop Clusters using Kerberos, Ranger, Knox, Acl's.
  • Able to interact effectively with other members of the Business Engineering, Quality Assurance, Users and other teams involved with the System Development Life cycle.

TECHNICAL EXPERTISE:

Big Data Technologies: Hadoop, HDFS, Hive, Cassandra, Pig, Scoop, Falcon, Flume, Zookeeper, Yarn, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm

Distributions: Hortonworks, Cloudera

Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia

Testing: Capybara, WebDriver Testing Framework, RSpec, Cucumber, Junit, SVN

Server: WEBrick, Thin, Unicorn, Apache, AWS

Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8/10)

Database & NoSql: Database Systems Oracle 11g/10g, DB2, SQL, My SQL, HBASE, MongoDB, Cassandra

Scripting & security: Shell Scripting, HTML Scripting, Python, Kerberos, Dockers

Security: Kerberos, Ranger, Sentry

Other tools: Redmine, Bugzilla, JIRA, Agile SCRUM, SDLC Waterfall, Kubenetes.

WORK EXPERIENCE:

Hadoop Administrator

Confidential, Englewood, NJ

Responsibilities:

  • Designed and implemented end to end big data platform solution on AWS.
  • Manage Hadoop clusters in production, development, Disaster Recovery environments.
  • Implemented SignalHub a data science tool and configured it on top of HDFS.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, RangerKMS, Falcon, Smart sense, Storm, Kafka.
  • Recovering from node failures and troubleshooting common Hadoop cluster issues.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Automated Hadoop deployment using Ambari blueprints and Ambari REST API’s.
  • Automated Hadoop and cloud deployment using Ansible.
  • Integrated active directory with Hadoop environment.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Configured Ranger for policy-based authorization and fine grain access control to Hadoop cluster.
  • Implemented hdfs encryption for creating encrypted zones in hdfs.
  • Configured Rangerkms to manage encryption keys.
  • Implemented a multitenant Hadoop cluster and on boarded tenants to the cluster.
  • Achieved data isolation through ranger policy-based access control.
  • Used YARN capacity scheduler to define compute capacity.
  • Responsible for building a cluster on HDP 2.5
  • Worked closely with developers to investigate problems and make changes to the Hadoop environment and associated applications.
  • Involved in the development of Chuwka dash board for pipeline monitoring and management
  • Implemented Hortonworks Nifi (HDP 2.5) and recommended a solution to inject data from multiple data sources to HDFS and Hive using Nifi.
  • Expertise in recommending hardware configuration for Hadoop cluster
  • Installing, Upgrading and Managing Hadoop Cluster on Hortonworks
  • Trouble shooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
  • Worked with developer teams on Nifi workflow to pick up the data from rest API server, from Data Lake as well as from SFTP server and send that to Kafka broker.
  • Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading.
  • Managing and reviewing Hadoop log files.
  • Proven results-oriented person with a focus on delivery.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Managed cluster coordination services through Zookeeper.
  • System/cluster configuration and health check-up.
  • Continuous monitoring and managing the Hadoop cluster through Ambari.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Resolving tickets submitted by users, troubleshoot the error documenting, resolving the errors.
  • Performed HDFS cluster support and maintenance tasks like Adding and Removing Nodes without any effect to running jobs and data.

Environment: HDFS, Map Reduce, Hive, Pig, Flume, Oozie, Sqoop, HDP2.5, Ambari 2.4, Spark, SOLR, Storm, Knox, Centos 7, Teradata and MySQL.

Hadoop Admin

Confidential, Oak Brook, IL

Responsibilities:

  • Working on developing rapid deployment scripts for deploying Hadoop ecosystems using Puppet platform. These scripts installed Hadoop, Hive, PIG, Oozie, Flume, Zookeeper and other components in Hadoop ecosystems along with Monitoring components for Nagios and Ganglia.
  • Implementation of Kerberized Hadoop Ecosystem. Using Sqoop and Nifi in a Kerberized system to transfer data from relational databases like MySQL to HDFS.
  • Wrote shell scripts to dynamically scale up or scale down the Hadoop data nodes on Rackspace infrastructure using API's.
  • Install, manage and support Linux operating systems, such as RHEL 7 and CentOS 7.
  • Worked with Amazon web services. Created EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce) with Hive scripts to process big data.
  • Worked on Pig and Hiveql. Experienced in data warehouse, schema creation and management.
  • Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Cloudera Manager Enterprise.
  • Building/Maintaining Docker Container clusters managed by Kubernetes, Linux, Bash, GIT, Docker. Utilized Kubenetes and Docker for the runtime environment of the CI/CD system to build, test and deploy
  • Planned, scheduled and Implemented OS patches on Centos / RHEL boxes as a part of proactive maintenance
  • Used NIFI to pull the data from different source and to push the data to HBASE and HIVE
  • Worked on Oozie to run multiple Hive and Pig jobs.
  • Balanced and tuned HDFS, Hive, MapReduce, and Oozie work flows.
  • Worked on installing operating system and Hadoop updates, patches, version upgrades when required.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Monitored and provided support for development and production cluster.
  • Integrated Kerberos Security into the CDH cluster
  • Implemented Performance optimizations for Hadoop ecosystem components like Hive, Pig, SQOOP and OOZIE w.r.t.server infrastructure.
  • Ran the benchmarking tools and identified the bottlenecks tuned the cluster to improve the performance.
  • Implemented schedulers on the Job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
  • Provided L1 and L2 support for the internal team.

Environment: CDH 5.7.1, CDH 5.6.1, RHEL 7, AWS, EC2, S3, EMR, Ganglia, Hadoop, Hive, Oozie, Pig, HDFS, Map Reduce

Hadoop Admin

Confidential, Nashua, NH

Responsibilities:

  • Hadoop administrator in Hortonworks distribution with 6 clusters which included POC clusters and PROD clusters.
  • Big Data DEV & PROD cluster upgraded from HDP 2.3.x to HDP2.5.x
  • Big Data DEV & PROD cluster upgraded from Ambari 2.1.x to 2.5.x.x
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
  • Backed up data on regular basis to a remote cluster using distcp.
  • Linux-based implementations such as Operating System patching RHEL6.x to 6.8.
  • Responsible for Installation, Configuration, Implementation, Upgradation, Maintenance & Troubleshooting of application servers and good experience in clustering.
  • Creating and maintaining user accounts, profiles, security, rights, disk space and monitoring.
  • Implementation in Hive and its components and troubleshooting if any issues arise with Hive. Published Hive LLAP in development environment.
  • Server Consolidation and Migration of Applications on Oracle and Java Applications.
  • Responsible for cluster availability and experienced on ON-call support.
  • Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files.
  • Responsible for Performing Filesystem Checks (fsck) time to time to check any over- replicated blocks, under replicated blocks, miss-replicated blocks, corrupt blocks and missing replicas.
  • Using Nagios to manage and monitor the Cluster performance.

Environment: Hortonworks HDP 2.5.3, Ambari 2.5.0.3, RHEL -6.8, Oracle 12g, MS-SQL, Hdfs, Hive, Zookeeper, Oozie, MapReduce, Yarn, Nagios, Sqoop, Hue.

Hadoop Admin

Confidential, San Jose, CA

Responsibilities:

  • Hadoop administrator in Hortonworks distribution with 5 clusters which included POC clusters and PROD clusters.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Worked on Hive and its components and troubleshooting if any issues arise with Hive.
  • Responsible for cluster availability and experienced on ON-call support.
  • Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
  • Perform Filesystem Checks (fsck) time to time to check any over replicated blocks, under replicated blocks, misreplicated blocks, corrupt blocks and missing replicas.
  • Hands on experience installing, configuring and deploying Openstack solutions.
  • Linux-based Virtualization implementations such as VM Ware and Xen on Red Hat.
  • Expertise in Installation, Configuration, Implementation, Upgradation, Maintenance & Troubleshooting of application servers and good experience in clustering.
  • Creating and maintaining user accounts, profiles, security, rights, disk space and monitoring.
  • Server Consolidation and Migration of Applications on Oracle and Java Applications
  • Used Nagios to manage and monitor the Cluster performance.

Environment: Cent OS, Oracle, MS-SQL, Zookeeper, Oozie, MapReduce, YARN, Puppet Nagios, Hortonworks HDP 2.3, REST APIs, Amazon webservices, Ambari 2.1.2, Sqoop, Hive.

Linux Hadoop Admin

Confidential

Responsibilities:

  • Installed RedHat Enterprise Linux (RHEL 6) on production servers.
  • Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
  • Benchmarking the cluster using Terasort, TestDFSIO and tuning Hadoop configuration parameter and Java Virtual Machine (JVM).
  • Developed data pipelines with combinations of Hive, Pig and Sqoop jobs scheduled with Oozie.
  • Worked on transferring data between database and HDFS using Sqoop.
  • Worked with Hive data warehouse to analyze the historic data in HDFS to identify issues and behavioral patterns.
  • Created Hive tables as per requirement which were internal and external tables and used static and dynamic partitions to improve efficiency.
  • Implemented UDF's in java for Hive to process the data that can't be performed using Hive in built functions.
  • Worked with AVRO, RegEx and JSON for serialization and de-serialization packed with hive to parse the content of streamed log data and implemented hive custom UDF's.
  • Worked with Pig scripts for advanced analytics.
  • Developed Pig UDF's in java for custom data for various levels of optimization.
  • Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Designed and implemented custom writable, custom input formats, custom partitioner and custom comparators.

Environment: CentOS, YUM, RPM, DHCP, NFS, NIS, DNS, FTP, Apache, SMTP, Cloudera, Hadoop, LVM, SQL, JAVA, Multipathing, VMWare, Puppet, SAN, Apache-Tomcat, WAS, HBase-Hive.

Hire Now