We provide IT Staff Augmentation Services!

Hadoop Admin Resume

2.00/5 (Submit Your Rating)

Irvine, CA

SUMMARY:

  • 8+ years of professional IT experience which includes experience in Big data ecosystem related technologies.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Pig, Hive, Sqoop, Oozie, HA, HBase, Yarn and MapReduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, and Flume.
  • Well versed with installation, configuration, managing and supporting Hadoop cluster using various distributions like Apache Hadoop, Cloudera - CDH and Hortonworks HDP.
  • Experience in managing and reviewing Hadoop log files.
  • Experienced with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Coordinated with technical teams for installation of Hadoop and third party related applications on systems
  • Good Knowledge in Hadoop, Pig, Hive, Sqoop, Yarn and designing and implementing Map/Reduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
  • Capable of processing large sets of structured, semi-structured, unstructured data and supporting systems application architecture.
  • Handled importing of data from various data sources, performed transformations using Hive, Pig, loaded data into HDFS and HBase.
  • Experience in providing security for Hadoop Cluster with Kerberos
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Designed, implemented, and administered centralized authentication systems utilizing LDAP, Kerberos, and x509 certificate authentication for several client projects to manage back-end, developer facing, and customer facing access control.
  • Designed, implemented, and deployed LAMP stack applications (Redmine, MediaWiki, Puppet) and aided in the deployment of in-house LAMP and Tomcat applications and administering the Apache HTTPd, MySQL, and Postgres server resources used by these applications.
  • Created and managed Kerberos secured NFS, AFP, and WebDAV file shares and integrated access controls using LDAP, Kerberos, and x509 certificates to secure these resources.
  • Hands-on programming experience in various technologies like JAVA, J2EEHTML, XML
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into partitioned Hive tables
  • Experience on Hadoop cluster maintenance, including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
  • Conducted detailed analysis of system and application architecture components as per functional requirements.
  • Hands on experience with opens source monitoring tools including; Nagios and Ganglia.
  • Good Knowledge on NoSQL databases such as Cassandra, MongoDB.
  • Monitor and manage Linux servers (Hardware profiles, Resource usage, Service status etc) Server backup and restore Server status reporting, Managing user accounts, password policies and files permissions.
  • Tech-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
  • Skillfully exploit OLAP analytical power of Teradata by using OLAP functions such as Rank, Quantile, Csum, MSum, group by grouping set etc to generate detail reports for marketing folks.
  • Worked with Transform Components such as Aggregate, Router, Sorted, Filter by Expression, Join, Normalize and Scan Components and created appropriate DMLs and Automation of load processes using Autosys.
  • Worked in Agile environment. Participated in scrum meetings/standups.
  • Extensively worked on several ETL assignments to extract, transform and load data into tables as part of Data Warehouse development with high complex Data models of Relational, Star, and Snowflake schema.
  • Experienced in all phases of Software Development Life Cycle (SDLC).
  • Expert knowledge in using various Transformation components such as Join, Lookup, Update, Router, Normalize, De-normalize, Partitioning and De-partitioning components etc.
  • Experience in Data Modeling, Data Extraction, Data Migration, Data Integration, Data Testing and Data Warehousing using Ab Initio.
  • Configured Informatica environment to connect to different databases using DB config, Input Table, Output Table, Update table Components.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, YARN, Hadoop HA

BigData: Apache Hadoop, Cloudera Hadoop, Hortonworks

Scripting Languages: Javascript, Shell, Python, Perl, Bash

Programming Languages: Java, C++, C, SQL, PL/SQL

Web Services: SOAP (JAX-WS), WSDL, SOA, Restful (JAX-RS), JMS

Application Servers: Apache Tomcat, WebLogic Server, WebSphere, JBoss

Databases: Oracle9.x, 10g, 11g MS SQL Server, MySQL Server, DB2, HBaseMongoDB, Cassandra

Version Control: PVCS, CVS, VSS

Networking & Protocols: TCP/IP, Telnet, HTTP, HTTPS, FTP, SNMP, LDAP, DNS.

Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8

Data Modelling: Star-Schema Modelling, Snowflakes Modelling, Erwin 4.0, Visio

RDBMS: Oracle 11g/10g/9i/8i, Teradata 13.0, Teradata V2R6, Teradata 4.6.2, DB2, MS SQL Server 2000, 2005, 2008.

PROFESSIONAL EXPERIENCE:

Hadoop Admin

Confidential, Irvine, CA

Responsibilities:

  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.xw
  • Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Worked on Hue interface for querying the data.
  • Automating system tasks using Puppet.
  • Wrote number of Shell scripts (Bash shell and Perl).
  • Modifying the manifest as per the requirement on Puppet.
  • Installation of Perl, Autosys and Teradata Modules.
  • Work closely with DevOps team to support infrastructure and tools that facilitate product development.
  • Creating users, groups and roles using Identity and Access Management Tool (IAM)
  • Manage and monitor hardware, networks and build automation for infrastructure provisioning.
  • Build, manage and monitor Linux clusters/servers and networks to support product development.
  • Build monitoring and alerting infrastructure to measure performance, utilization and failures.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Timely and reliable support for all production and development environment: deploy, upgrade, operate and troubleshoot.
  • Implement both major and minor version upgrades to the existing cluster and also rolling back to the previous version.
  • Implemented a Major version upgrade from 4.7.1 to 5.2.3 and also done few Minor Version upgrades from 5.2.5 to 5.3.3 and 5.3.3 to 5.4.3
  • Installing and configuring HDP 1.3 for proof of concept.
  • Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2 and followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Created Hive tables to store the processed results in a tabular format.
  • Utilized cluster co-ordination services through ZooKeeper.
  • Worked on Data Lake architecture to collate all the enterprise data into single place for ease of correlation, data analysis to find operational and functional issues in the enterprise workflow as part of this project.
  • Designed ETL flows to get data from various sources, transform for further processing and load in Hadoop/HDFS for easy access and analysis by various tools.
  • Developed multiple Proof-Of-Concepts to justify viability of the ETL solution including performance and compliance to non-functional requirements.
  • Conduct Hadoop training workshops for the development teams as well as directors and management team to increase awareness.
  • Prepare presentations of solutions to BigData/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
  • Collaborate with Hortonworks team for technical consultation on business problems and validate the architecture/design proposed.
  • Designed end to end ETL flow for one of the feed having millions of records inflow daily. Used apache tools/frameworks Hive, Pig, Sqoop & HBase for the entire ETL workflow.
  • Designed, implemented and managed the Backup and Recovery environment.
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytab tools.
  • Configuring Sqoop and Exporting/Importing data into HDFS.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Configured Name Node high availability and Name Node federation.
  • Used Sqoop to import and export data from RDBMS to HDFS and vice-versa.
  • Performance tuned the Hadoop cluster to improve the efficiency.
  • Involved in configuring Quorum based HA for Name Node and made the cluster more resilient.
  • Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized users.

Environment: CDH 5.4.3 and 4.x, Cloudera Manager CM 5.1.1, Horton 2.x, Ambari, 2.2.1CDH, Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, kafka, Redhat/Centos 6.5, Teradata, ETL Tools.

Sr. Hadoop Admin

Confidential, Philadelphia, Pennsylvania

Responsibilities:

  • Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop CDH 3.x, CDH 4.x.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs for data cleaning.
  • Involved in clustering of Hadoop in the network of 70 nodes.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in developing new work flow Map Reduce jobs using Oozie framework.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Worked on upgrading cluster, commissioning & decommissioning of DataNodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Tableau reporting on the top of HIVE external tables
  • Created PIG scripts to run on Huge bucketed
  • Hive Tables to pull necessary fields and create Hive
  • Tables and use them as source for Tableau as a part of performance improvement.
  • Exporting data to Teradata using SQOOP.
  • Integrated HBASE with Hive.
  • Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
  • Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
  • Extensively used Bash shell scripting for doing manipulations of the flat files, given by the clients.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.

Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3.x, CDH4.x, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux, Tableau, ETL Tools.

Hadoop Admin

Confidential, Plano, Texas

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters and Cloudera Distribution Hadoop CDH for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
  • Performed Installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with CDH3.
  • Installed Namenode, Secondary name node, Job Tracker, Data Node, Task tracker.
  • Performed benchmarking and analysis using Test DFSIO and Terasort.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Implemented Rack Awareness for data locality optimization.
  • Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Created a local YUM repository for installing and updating packages.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Implemented Name node backup using NFS.
  • Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment
  • Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Worked on importing and exporting Data into HDFS and HIVE using Sqoop.
  • Worked on analyzing Data with HIVE and PIG
  • Helped in setting up Rack topology in the cluster.
  • Helped in the day to day support for operation.
  • Worked on performing minor upgrade from CDH3-u4 to CDH3-u6
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Deployed Network file system for Name Node Metadata backup.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured and deployed hive metastore using MySQL and thrift server.

Environment: Hadoop distributions (CDH3), HDFS, MapReduce, Hive, Pig, Sqoop, Unix, Red hat Linux 6.x,7.x, Shell Scripts, Nagios, Ganglia monitoring, Kerberos, Shell scripting

Unix/Linux System Administrator

Confidential, Middletown, NJ

Responsibilities:

  • Installed, configured and administered RHEL 5/6 on VMware server 3.5.
  • Converted a lot of physical server on Dell R820 into virtual machines for a Lab environment.
  • Managed file space and created logical volumes, extended file systems using LVM.
  • Performed daily maintenance of servers and tuned system for optimum performance by turning off unwanted peripheral and vulnerable service.
  • Managed RPM Package for Linux distributions
  • Monitored system performance using TOP, FREE, VMSTAT & IOSTAT.
  • Set up user and group login ID's, password, ACL file permissions, and assigned user and group quota
  • Configured networking including TCP/IP and troubleshooting.
  • Designed Firewall rules to enable communication between servers.
  • Monitored scheduled jobs, workflows, and related to day to day system administration.
  • Respond to tickets through ticketing systems.

Environment: Redhat Enterprise Linux 5/6, VMware server 3.5, NIS, NFS, DHCP, and Dell R820

UNIX/LINUX Administrator

Confidential

Responsibilities:

  • Installation and configuration of Red Hat Enterprise Linux (RHEL) 5x, 6x Servers on HP, Dell Hardware and VMware virtual environment
  • Installation, setup and configuration of RHEL, CentOS, OEL and VMware ESX on HP, Dell and IBM hardware
  • Installation and Configuration of Sun Enterprise Servers, HP and IBM Blade Servers, HP 9000, RS 6000, IBM P series
  • Expertise in enterprise class storage including SCSI, RAID and Fiber-Channel technologies
  • Configuration and maintenance of virtual server environment using VMWAREESX 5.1/5.5, VCenter
  • Involved in supporting different HW ranging from Rack to Blade and also supported Vblock HW to run ESX environment
  • Configured Enterprise UNIX/LINUX systems in heterogeneous environment (Linux (Redhat&SUSE), Solaris, HP-UX) with SAN/NAS infrastructure across multiple sites on mission business critical systems
  • Created a standard kickstart based installation method for RHEL servers. Installation includes all changes required to meet company’s security standards. Installation method is possible over HTTP, or via PXE on a separate network segment
  • Adding SAN using multipath and creating physical volumes, volume groups, logical volumes
  • Storage Provisioning, Volume and File system Management using LVM, VERITAS Volume Manager and VERITAS File system (VERITAS Storage Foundation), Configuring ZFS file systems
  • Creating user accounts, user administration, local and global groups on Solaris and Red Hat Linux platform
  • Setup, Implementation, Configuration, documentation of Backup/Restore solutions for Disaster/Business Recovery of clients using TSM backup on UNIX, SUSE & Redhat Linux platforms.
  • Installed and configured Netscape, Apache web servers and Samba Server
  • Installed and Configured WebSphere Application servers on Linux and AIX
  • Installation and configuration MySQL on Linux servers.
  • Performed quarterly patching of Linux, AIX, Solaris and HPUX servers on regular schedule
  • Heavily utilize the LAMP stack (Linux, Apache, Mysql, PHP/Perl) to meet customer needs
  • Installation and configuration of Oracle 11g RAC on Red Hat Linux nodes
  • Installed and configured different applications like apache, tomcat, JBoss, xrdp, WebSphere, etc.. and worked closely with the respective teams
  • Setup, Implementation, Configuration of SFTP/FTP servers
  • Setting up JBoss cluster and configuring apache with JBoss on RetHat Linux. Proxy serving with Apache. Troubleshooting Apache with JBoss and Mod jk troubleshooting for the clients
  • Installed Jenkins and created users and maintained Jenkins to deploy Java code developed by developers and build framework.
  • System performance monitoring and tuning.
  • Working on Shell and perl scripts
  • Setting up the lab environment with Tomcat/Apache, configuring the setup with F5 virtual load balancer for customer application
  • Document all system changes and create Standard Operating Procedure (SOP) for departmental use. Resolve all helpdesk tickets and add/update asset records using Remedy Action Request System
  • Provide 24x7 oncall production support on rotation basis.

Environment: Redhat 4x, 5x, Solaris 9/10, AIX, HP-UX, VMware ESX 5.0/5.1/5.5, VSphere,HPProLiant Servers DL 380/580, Dell Servers (R series),Windows 2008 sever, EMCClariion, Netapp, SCSI, VMWare converter, Apache Webserver, F5 load balancer, oracle, MySQL, PHP, DNS, DHCP, BASH, NFS, NAS, Spacewalk, WebSphere, WebLogic, Java, Jenkins, JBoss, Tomcat, Kickstart.

Linux Administrator

Confidential

Responsibilities:

  • Performed Red Hat Enterprise Linux, Oracle Enterprise Linux (OEL), Solaris, and Windows Server deployment to build new environment by using Kickstart and jumpstart.
  • Preformed Installation, adding and replacement of resources likeDisks, CPU’s and Memory, NIC Cards, increasing the swapandMaintenanceof Linux/UNIX and Windows Servers.
  • Implemented NFS, SAMBA file servers and SQUID caching proxy servers
  • Implemented centralized user authentication using OpenLDAP and Active Directory
  • Worked withVMware ESX ServerConfigured for Red Hat EnterpriseLinux.
  • Configured IT hardware- switches, HUBS, desktops, rack servers
  • Structured datacenter stacking, racking, and cabling
  • Install, configured, troubleshoot, and administer VERITAS and Logical volume manager and managing file systems.
  • Monitored system performance, tune-up kernel parameter, added/removed/administered hosts and users
  • Created and Administered user Accounts using native tools and managing access using sudo.
  • Actively participated and supported in the migration of 460+ production servers from old data center to New Data Center
  • Involved in using RPM for package management and Patching.
  • Creating documentation for datacenter hardware setups, standard operational procedures and security policies
  • Create and maintain technical documentation for new installations and systems changes as required

Environment: RHEL 3/4/5, Solaris 8/9, ESX 3/4, HP DL 180/360/580 G5, IBM p5 series, Fujitsu, M4000, EMC Symmetrix DMX 2000/3000, Linux Satellite Server

We'd love your feedback!