We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

4.00/5 (Submit Your Rating)

Englewood Cliffs, NJ

SUMMARY:

  • Optimized the configurations of Map Reduce, Pig and Hive jobs for better performance.
  • Advanced understanding in Hadoop Architecture such as HDFS, Yarn.
  • Strong experience configuring Hadoop Ecosystem tools with including Pig, Hive, HBase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark and Storm.
  • Experience in designing, Installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks and Cloudera.
  • Handled more than 10+ clients on Hadoop and efficient delivery based on customer needs
  • Installed and configured monitoring tools Munin and Nagios for monitoring the network bandwidth and the hard drives status.
  • Strong experience in Hortonworks and Cloudera distribution of Hadoop and its ecosystem components.
  • Experience in installation, management and monitoring of Hadoop cluster using Apache, Cloudera Manager.
  • Strong experience on Red Hat server administration and Kerberos Security.
  • Certified Java/J2EE Developer with more than 2+ years of experience in Java, J2EE system design, development, testing, integration, implementation, maintenance of n - tier web based applications and server side components in enterprise environment.
  • Strong experience on Distributed Server Operations, ITIL and work with Incident management, change management & problem management.
  • At PayPal, Worked as Big data Hadoop administrator lead on one of the world’s largest Big Data Hadoop clusters of 1500 plus nodes of multiple clusters with 50 petabytes of scalable cluster.
  • Responsible for configuring, integrating, and maintaining all Development, QA, Staging and Production PostgreSQL databases within the organization.
  • Responsible for all backup, recovery, and upgrading of all of the PostgreSQL databases.
  • Experience in Data Meer as well as big data Hadoop. Experienced in NoSQL databases such as HBase, and MongoDB. Store and manage the data coming from the users in Mongo DB database.
  • Good experience in installation/upgradation of VMware. Automated server building using System Imager, PXE, Kickstart and Jumpstart.
  • Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developers/business users for day-to-day activities.
  • Have experience in 15 node clusters step up in Ubuntu Environment.
  • Expert level understanding of the AWS cloud computing platform and related services.
  • Experience in managing the Hadoop Infrastructure with Cloudera Manager and Ambari.
  • Involved in Cluster maintenance, bug fixing, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Management of security in Hadoop Clusters using Kerberos, Ranger, Knox, Acl's.
  • Excellent experience in Shell Scripting.
  • Expertise with Hadoop solution architecture, design and delivery of any data and analytics requirements.
  • Created strategies and plans for data security, disaster recovery and total business continuity.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Zookeeper, Nifi, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm, Hortonworks and Cloudera distribution.

Scripting Languages: Shell Scripting, Ruby, java.

Databases: Oracle 11g, MySQL, MS SQL Server, HBase, Cassandra, MongoDB

Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP

Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia

Application Servers: Apache Tomcat, Weblogic Server, WebSphere

Reporting Tools: Kerberos, Ranger, Knox. Cognos, Hyperion Analyzer, OBIEE & BI+

ElasticsearchLog stash: Kibana

Automation tools: Ansible

PROFESSIONAL EXPERIENCE:

Hadoop Administrator

Confidential - Englewood Cliffs, NJ

Responsibilities:

  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Hands on experience on major components in Hadoop Ecosystem including HDFS and MR framework, YARN, HBase, Hive, Pig, Scoop, Zookeeper.
  • Experience in copying files with in cluster or intra - cluster using DistCp command line utility
  • Installing and configuring Hadoop eco system like Sqoop, pig, hive.
  • Experience in importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
  • Installed and configured Spark on multi node environment.
  • Monitored workload, job performance and capacity planning.
  • Expertise in recommending hardware configuration for Hadoop cluster.
  • Involved in designing and implementation of secure Hadoop cluster using Kerberos.
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
  • Managing and reviewing Hadoop and HBase log files.
  • Experience in creating Kafka Topics.
  • Cluster maintenance as well as creation and removal of nodes using Cloudera Manager
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Administration, installing, upgrading and managing distributions of Hadoop (CDH4, CD5, Cloudera manager), Hive, HBase.
  • Experience in analyzing Log files for Hadoop and eco system services and finding root cause.

Environment: CDH 5.4.5, Hive1.2.1, HBase1.1.2, Flume1.5.2, Map Reduce, Sqoop1.4.6, Spark2.1.0, Kafka, Shell Script, Oozie 4.2.0, Zookeeper 3.4.6.

Hadoop Administrator

Confidential - Oak Brook, IL

Responsibilities:

  • Hadoop administrator managing multi node Hortonworks HDP clusters (HDP 2.6.0.3/2.4.2 ) distributions for 3 clusters for Dev, Pre Prod and PROD environments with 200+ nodes with overall storage capacity of 5 PB.
  • Day to day responsibilities include monitoring and troubleshooting incidents resolving developer issues & Hadoop eco system run time failures, enabling security policies, managing data storage and compute resources.
  • Automating multiple Cassandra activities like running repair & cleaning of snapshot using shell scripts and cron jobs
  • Responsible for Cluster maintenance, Cluster Monitoring, Troubleshooting, Manage and & review log files and provide 24X7 on call support with scheduled rotation.
  • Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
  • Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
  • Write Apache Spark Scala code to move data from Cassandra table to Filodb table or vice versa
  • Responsibilities include implementing change orders for creating hdfs folders, hive DB/tables, hbase
  • Namespace/commissioning and decommissioning Data nodes, troubleshooting, manage and review data backups, manage & review log files.
  • Implemented HDP upgrade from 2.4.2 to 2.6.0.3 version.
  • Implemented High Availability for Namenode/Resource Manager/Hbase/Hive/Knox Services.
  • Installing, configuring new hadoop components and upgrading the cluster with proper strategies which include ATLAS/Phoenix/Zeppelin.
  • Diligently teaming with the infrastructure, network, database and application teams to guarantee high data quality and availability.
  • Aligning with the systems engineering team to propose and help deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Analyze the Performance of the Linux System to identify Memory, disk I/O and network problem.
  • Troubleshoot issues with hive, HBase, pig, spark /Scala scripts to isolate /fix issues.
  • Screen Hadoop cluster job performances and capacity planning.
  • Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.
  • Good experience in troubleshoot production level issues in the cluster and its functionality.

Environment: Hortonworks HDP 2.6.0.3, HBase, Hive, HBase, Ambari 2.5.0.3, Linux, Azure Cloud.

Hadoop Administrator

Confidential - Dallas, TX

Responsibilities:

  • Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
  • In depth understanding of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Resource Manager, Node Manager and YARN / Map Reduce programming paradigm.
  • Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
  • Extensively worked on commissioning and decommissioning of cluster nodes, file system integrity checks and maintaining cluster data replication.
  • Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
  • Responsible for efficient operations of multiple Cassandra clusters
  • Implemented Python script which calculates the cycle time from the Rest API and fix the wrong cycle time data in Oracle database.
  • Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafka broker.
  • Involved in developing new work flow Map Reduce jobs using Oozie framework.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Created NiFi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
  • Involved and experienced in Cassandra cluster connectivity and security.
  • Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
  • Experience migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Setting up HDFS Quotas to enforce the fair share of computing resources.
  • Strong Knowledge in Configuring and maintaining YARN Schedulers (Fair, and Capacity)
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Experience in projects involving movement of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling.
  • Used ANT as a build tool for building the application and deploying it in the Sun Java System Application Server(SJSAS)
  • Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
  • Support for parallel data load into Hadoop.
  • Involved in setting up HBase which includes master and region server configuration, High availability configuration, performance tuning and administration.
  • Created user accounts and provided access to the Hadoop cluster.
  • Upgraded cluster from CDH 5.3 to CDH 5.7 and Cloudera manager from CM 5.3 to 5.7.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on ETL process and handled importing data from various data sources, performed transformations.

Environment: Hadoop, Map Reduce, Shell Scripting, spark, Pig, Hive, HDFS, Yarn, Hue, Sentry, Oozie, Zoo keeper, Impala, Solr, Kerberos, cluster health, Puppet, Ganglia, Nagios, Flume, Sqoop, storm, Kafka, KMS

Hadoop Administrator

Confidential - Brooklyn, NY

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively involved in Installation and configuration of Cloudera distribution, Namenode, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs for data cleaning.
  • Involved in clustering of Hadoop in the network of 70 nodes.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Experienced on Application Servers like BEA WebLogic 8.1/9.2, JBoss 4.2, Apache Tomcat 3.0/5.5, Oracle Application Server 10.1.2 and Sun Java System Application Server.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on monitoring of VMware virtual environments with ESXi 4 servers and Virtual Center. Automated tasks using shell scripting for doing diagnostics on failed disk drives.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
  • Responsible for developing data pipeline using HDInsights, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.

Environment: s: Hadoop, HDFS, Map Reduce, Impala, Sqoop, HBase, Hive, Flume, Oozie, Zoo keeper, solr, Performance tuning, cluster health, monitoring security, Shell Scripting, NoSQL/HBase/Cassandra, Cloudera Manager.

Linux/Unix Systems Administrator

Confidential

Responsibilities:

  • Day - to-day administration on Sun Solaris, RHEL 4/5 which includes Installation, upgrade & loading patch management & packages
  • Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in Linux environment.
  • Identifying operational needs of various departments and developing customized software to enhance System's productivity.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
  • Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using Linux, Apache, MySQL Database backend.
  • Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers. Monitoring System Metrics and logs for any problems.
  • Responsible for monitoring overall project and reporting status to stakeholders.
  • Identify repeated issues in production by analyzing production tickets after each release and strengthen the system testing process to arrest those issues moving to production to enhance customer satisfaction
  • Designed and coordinated creation of Manual Test cases according to requirement and executed them to verify the functionality of the application.
  • Manually tested the various navigation steps and basic functionality of the Web based applications.
  • Experience interpreting physical database models and understanding relational database concepts such as indexes, primary and foreign keys, and constraints using Oracle.
  • Writing, optimizing, and troubleshooting dynamically created SQL within procedures
  • Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
  • Responsible for creating virtual environments for the rapid development.
  • Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues User management like adding, modifying, deleting, grouping
  • Responsible for preventive maintenance of the servers on monthly basis. Configuration of the RAID for the servers. Resource management using the Disk quotas.
  • Responsible for change management release scheduled by service providers.
  • Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.

Environment: UNIX, Solaris, HP UX, Red Hat Linux, Windows, FTP, SFTP

We'd love your feedback!