We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

2.00/5 (Submit Your Rating)

Roseville, CA

PROFESSIONAL SUMMARY:

  • Over 8 years of professional IT experience , which includes experience in Big Data ecosystem related technologies.
  • About 4 years of exclusive experience in Hadoop Administration and its components like HDFS, Map Reduce, Apache Pig, Hive, Sqoop, Oozie and Flume .
  • Proven expertise in Hadoop Projects Implementation and Configuring Systems.
  • Excellent experience in Hadoop architecture and various components such as Job Tracker, Task Tracker, NameNode, DataNode, MapReduce, YARN, Sqoop for data migration, Flume for data ingestion, Oozie for scheduling and Zookeeper for coordinating cluster resources
  • Involved in Design and Development of technical specifications using Hadoop Echo System tools .
  • Administration, Testing, Change Control Process, Hadoop administration activities such as installation and configuration and maintenance of clusters.
  • Expertise in setting, configuring & monitoring of Hadoop cluster using Cloudera CDH3, CDH4, Apache Hadoop on RedHat, Centos&Windows.
  • Expertise in Commissioning, decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
  • Rack aware configuration for quick availability and processing of data.
  • Backup configuration and Recovery from a NameNode failure.
  • Good Experience in Planning, Installing and Configuring Hadoop Cluster in Apache Hadoop and Cloudera Distributions.
  • Handsome experience in Linux admin activities on RHEL &Cent OS.
  • Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
  • Experience in deploying Hadoop 2.0(YARN).
  • Excellent command in creating Backups & Recovery and Disaster recovery procedures and Implementing BACKUP and RECOVERY strategies for off - line and on-line Backups.
  • Experience in Big data domains like Shared Service (Hadoop Clusters, Operational Model, Inter-Company Charge back, and Lifecycle Management).
  • Excellent communication, interpersonal, analytical skills, and strong ability to perform as part of team.
  • Sound knowledge in SQL and UNIX commands.
  • Exceptional ability to learn new concepts, hard working and enthusiastic.

TECHNICAL SKILLS:

Technologies/Tools: Hadoop, HDFS, YARN, Cloudera, Cloudera Manager, HBase, Hive, Pig, Oozie, Sqoop, Flume, Storm, Zoo Keeper, AWS, RackSpace, HortonWorks, CDH 4, CDH 5, Shell Scripting,Tableau.

Databases: HiveQL, SQL Server, Oracle 10g, SQL Profiler Oracle 9i, 10g, Teradata, Xml, Flat files.

Dimensional Data Modeling & Data Warehousing: Data Modeling, Star Schema Modeling, Snowflake Modeling, Fact and Dimension Tables, Physical and Logical Data Modelling. Informatica Power Center 9.0/8.6, Informatica Designer, Workflow Manager, Workflow Monitor, OLAP, Mapplets, Transformations.

Operating Systems: Linux RedHat, CentOs, Windows Server 2003/2008, Win 7/8.

PROFESSIONAL EXPERIENCE:

Hadoop Administrator

Confidential, Roseville, CA

Responsibilities:

  • Responsible for designing Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
  • Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
  • Provided Hadoop, OS, Hardware optimizations.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
  • Administered and supported distribution of Hortonworks.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Implemented Fair scheduler on the job tracker to allocate fair amount of resources to small jobs.
  • Performed operating system installation, Hadoop version updates using automation tools.
  • Configured Oozie for workflow automation and coordination.
  • Implemented rack aware topology on the Hadoop cluster.
  • Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
  • Configured ZooKeeper to implement node coordination, in clustering support.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
  • Populated HDFS with huge amounts of data using Apache Kafka.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Worked on developing scripts for performing benchmarking with Terasort/Teragen.
  • Implemented Kerberos Security Authentication protocol for existing cluster.
  • Good experience in troubleshoot production level issues in the cluster and its functionality.
  • Backed up data on regular basis to a remote cluster using distcp.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
  • Installed and maintain puppet-based configuration management system.
  • Deployed Puppet, Puppet Dashboard, and PuppetDB for configuration management to existing infrastructure.
  • Using Puppet configuration management to manage cluster.
  • Experience working on API.
  • Involved in Installing Cloudera Manager, Hadoop, Zookeeper, HBASE, HIVE, PIG etc.
  • Involved in configuring Quorum base HA for NameNode and made the cluster more resilient.
  • Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized users.
  • Fine Tuned JobTracker by changing few properties mapred-site.xml.
  • Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the TaskTrackers.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Kafka, Puppet, Ubuntu.

Hadoop Administrator

Confidential, Union, NJ

Responsibilities:

  • Installed and configured various components of Hadoop ecosystem and maintained their integrity
  • Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done
  • Designed, configured and managed the backup and disaster recovery for HDFS data.
  • Experience with Unix or Linux, including shell scripting
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
  • Commissioned Data Nodes when data grew and decommissioned when the hardware degraded
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked with application teams to install Hadoop updates, patches, version upgrades as required
  • Installed and Configured Hive, Pig, Sqoop and Oozie on the HDP cluster.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services
  • Implemented HDFS snapshot feature
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Ran monthly security checks through UNIX and Linux environment and installed security patches required to maintain high security level for our clients

Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Hive, Pig, Flume, Oozie, Zookeeper.

Hadoop Administrator

Confidential, Atlantic City, NJ

Responsibilities:

  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Involved in loading data from UNIX file system to HDFS.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning.
  • Expertise in recommending hardware configuration for Hadoop cluster
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution
  • Trouble shooting many cloud related issues such as Data Node down, Network failure and data block missing.
  • Managing and reviewing Hadoop and HBase log files.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Developed Hive UDF has to bring all the customers information into a structured format.
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Built automated set up for cluster monitoring and issue escalation process.
  • Administration, installing, upgrading and managing distributions of Hadoop (CDH3, CDH4, Cloudera manager), Hive, HBase.
  • Cluster co-ordination services through ZooKeeper.
  • Used Hive and Pig to analyze data from HDFS
  • Wrote Pig scripts to load and aggregate the data
  • Used Sqoop to import the data into SQL Database.
  • Used Java to develop User Defined Functions (UDF) for Pig Scripts.

Environment: Hadoop, HDFS, Map Reduce, Yarn, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, ZooKeeper, Java.

Hadoop Administrator

Confidential, Weehawken, NJ

Responsibilities:

  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, Resource Manager, Node Manager and Data Nodes.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Installed Oozie workflow engine to run multiple Hive Jobs
  • Worked with Kafka for the proof of concept for carrying out log processing on distributed system.
  • Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Worked on Hue interface for querying the data
  • Automating system tasks using Puppet.
  • Created Hive tables to store the processed results in a tabular format.
  • Utilized cluster co-ordination services through ZooKeeper.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Nagios
  • Configuring Sqoop and Exporting/Importing data into HDFS
  • Configured NameNode high availability and NameNode federation.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Configured NameNode high availability and NameNode federation.
  • Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
  • Data analysis in running Hive queries.
  • Generated reports using the Tableau report designer.

Environment: HDFS, Cloudera Manager, Map Reduce, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, ZooKeeper, Puppet, Tableau, and Java.

Software Developer

Confidential

Responsibilities:

  • Extensively worked in data Extraction, Transformation and loading from source to target system using BTEQ and MULTILOAD.
  • Involved developing scripts using Teradata SQL as per requirements.
  • Developed simple mappings using Informatica to load dimensions & fact tables as per STAR schema techniques.
  • Responsible for creating test cases, test plans, test data and reporting status ensuring accurate coverage of requirements and business processes.
  • Experience in SDLC and agile methodologies such as SCRUM.
  • Created Informatica mappings to build business rules to load data.
  • Configured and ran the debugger from within mapping designer to troubleshoot the mapping before the normal run of the workflow.
  • Created and tracked efficiently mapping parameters and variables.
  • Involved in performance tuning of required mappings to improve the efficiency while loading the data.

Environment: Informatica 8.6.1, Oracle10g, Teradata TD12.

We'd love your feedback!