Hadoop Administrator Resume
Marlborough, MA
SUMMARY
- Around 8 years of experience in IT with over around 6 years of hands - on experience as Hadoop Administrator.
- Experienced managing Linux platform servers and in installation, configuration, supporting and managing Hadoop Clusters.
- Performed administrative tasks on Hadoop Clusters using Cloudera/HortonWorks.
- Installed and configured Apache Hadoop, Hive and Pig environment on Amazon EC2 and assisted in designing, development and architecture of Hadoop and HBase systems
- Possess excellent Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Puppet or Chef.
- Hands-on experience on major components in Hadoop Ecosystem including Hive, Sqoop, Flume and knowledge of MapR/Reduce/HDFS Framework.
- Planning, deployment, and tuning of SQL (SQL Server, MySQL) and NoSQL (elasticsearch, Redis, memcached) databases.
- Experience in implementing Data Warehousing/ETL solutions for different domains.
- Expertise in implementing enterprise level security using AD/LDAP, Kerberos, Knox, Sentry and Ranger.
- Hands-on programming experience in various technologies like JAVA, J2EE, JSP, Servlets, SQL, JDBC, HTML, XML, Struts, Web Services, SOAP, REST, Eclipse, Visual Studio on Windows, UNIX and AIX.
- Experienced in database server performance tuning and optimization and troubleshooting and performance tuning for Complex SQL queries .
- Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, ElasticSearch, Zookeeper, Couchbase, Storm, Solr, Cassandra and Spark.
- Extensive Working Knowledge on Sqoop and Flume for Data Processing.
- Experience in installation, configuration, supporting and managing using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Expertise in developing solutions using Hadoop and Hadoop ecosystem.
- Loading the data from the different Data sources like Teradata and DB2 into HDFS using sqoop and load into partitioned Hive tables.
- Developed and Coordinated deployment methodologies (Bash, Puppet & Ansible).
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades and automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data, Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Strong working knowledge of business intelligence tools like Tableau and Cognos.
- Supported Web Sphere Application Server WPS, IBM HTTP/ Apache Web Servers in Linux environment for various projects.
- Supported geographically diverse customers and teams in a 24/7 environments.
- Team player with strong analytical, technical negotiation and client relationship management skills
- Developed Oozie workflows and sub workflows to orchestrate the Sqoop scripts, pig scripts, hive queries and the Oozie workflows are scheduled through Autosys.
- Good understanding in Deployment of Hadoop Clusters Using Automated Puppet scripts
- Experience in hardware recommendations, performance tuning and benchmarking
- Experience in IP Management (IP Addressing, Sub-netting, Ethernet Bonding, Static IP)
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 10/11 and Sun.
- Experience in Linux Storage Management. Configuring RAID Levels, Logical Volumes.
- Conducted detailed analysis of system and application architecture components as per functional requirements.
- Ability to work effectively in cross-functional team environments and experience of providing training to business user.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive,Cassandra, Pig, Scoop, Falcon, Flume, Zookeeper, Yarn, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4.
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Java Technologies: Java, J2EE, JSP, Servlets, Struts, Hibernate, Spring
Testing: Capybara, Web Driver Testing Frameworks RSpec, Cucumber, Junit, SVN
Server: WEBrick, Thin, Unicorn, Apache, AWS
Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8/10)
Database& NoSql: Database Systems Oracle 11g/10g, DB2, SQL, My SQL, HBASE, MongoDB, Cassandra
Scripting & security: Shell Scripting, HTML Scripting, Python, Kerberos, Dockors
Security: Kerberos, Ranger, Sentry
Other tools: Redmine, Bugzilla, JIRA, Agile SCRUM, SDLC Waterfall.
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential, Marlborough, MA
Responsibilities:
- Managing and scheduling Jobs on Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions.
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
- Automated the configuration management for several servers using Chef and Puppet.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Involved in Installing and configuring ranger for the authentication of users and Hadoop daemons.
- Experience in methodologies such as Agile, Scrum, and Test driven development.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration, datawarehouse, and Migration, and installation on Kafka.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Worked closely with infrastructure, network, database, and business intelligence and application teams to ensure business applications are highly available and performing within agreed on service levels.
- Implemented concepts of Hadoop eco system such as YARN, MapReduce, HDFS, HBase, Zookeeper, Pig and Hive.
- In charge of installing, administering, and supporting Windows and Linux operating systems in an enterprise environment.
- Accountable for storage, performance tuning and volume management of Hadoop clusters and MapReduce routines.
- Configured, Elastic Search Log Stash, Kibana to monitor spring batch jobs.
- In command of setup, configuration and security for Hadoop clusters using Ranger.
- Monitor Hadoop cluster connectivity and performance.
- Manage and analyze Hadoop log files.
- File system management and monitoring.
- Performed automation/configuration management using Chef, Ansible, and Docker based containerized applications.
- Managed Nodes, jobs and configuration using HPC Cluster Manager Tool.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics and Implemented Cassandra connector for Spark in Java.
- Led the evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
- Design and deployment of clustered HPC monitoring systems, including a dedicated monitoring cluster.
- Develop and document best practices, HDFS support and maintenance, Setting up new Hadoop users.
- Responsible for the new and existing administration of Hadoop infrastructure.
- Included DBA Responsibilities like data modeling, design and implementation, software installation and configuration, database backup and recovery, database connectivity and security.
- Built data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
Environment: Hadoop, Map Reduce, Cassandra, HDFS, Pig, GIT, Jenkins, kafka, Puppet, Ansible, Maven Spark, Yarn, HBase, Oozie, MapR, NoSQL, ETL, MYSQL, agile, Windows, UNIX Shell Scripting.
Hadoop Administrator
Confidential, San Jose, CA
Responsibilities:
- Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments. Configured Hadoop, Hive and Pig on Amazon EC2 servers.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
- Created Hadoop ecosystem (Hadoop, Hive, Pig, Oozie, Hue, Hbase/Cassandra, Flume) using both automated toolsets as well as manual processes.
- As part of the Data Infrastructure team, I was integral part of managing 24X7 Hadoop infrastructures. The environment was at tens of petabytes in size.
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Developing machine-learning capability via Apache Mahout.
- Installed and configured CDH5.0.0 cluster, using Cloudera manager.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Installation and configuration Hortonworks distribution HDP 1.3.2 and Cloudera CDH4.
- Leading development of terabyte-scale network packet analysis capability.
- Leading research effort to tightly integrate Hadoop and HPC systems.
- Purchased, deployed, and administered 70 node Hadoop cluster. Administered two smaller clusters.
- Maintained, supported, and upgraded Hadoop clusters.
- Monitored jobs, queues, and HDFS capacity.
- Setup security model for Proactive Monitor tool in QA and PROD in the Data warehouse environment.
- Balance, commission & decommission cluster nodes.
- Applied security (Ranger / Open LDAP) linking with Active Directory and/or LDAP.
- Enabled users to view job progress via web interface.
- On boarding users to use Hadoop - configuration, access control, disk quota, permissions etc.
- Addresses all issues, apply upgrades and security patches.
- Commission/de-commission nodes backup and restore.
- Applied "rolling" cluster node upgrades in a Production-level environment.
- Assembled newly bought hardware into racks with switches, assign IP addresses properly, firewalling, enable/disable ports, VPN etc.
- Worked with virtualization team to provision / manage HDP cluster components.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Azure, AWS, ETL, Ansible, Datawarehousing, Impala, Ambari 2.0, Linux Cent OS, HBase, MongoDB, MapR, Hortonworks 2.3, Puppet, Ambari, Kafka Cassandra, Ganglia, Agile/scrum.
Hadoop Administrator
Confidential, Phoenix, AZ
Responsibilities:
- Performed Hadoop installation, Configuration of multiple nodes in AWS EC2 using Horton works platform.
- Maintained and Monitored Hadoop and Linux Servers.
- Performed major and minor upgrades on Hortonworks Hadoop.
- Created VM's in Oracle Virtual Manager.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smartsense, Storm, Kafka.
- Setup and optimize Standalone Clusters, Pseudo-Distributed Clusters, and Distributed Clusters.
- Implemented Name Node backup using NFS.
- Performed various configurations, which includes, networking and IP Table, resolving hostnames, user accounts and file permissions, SSH Keyless login.
- Building Hadoop-based big data enterprise platforms coding in python and devops with Chef and Ansible
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions
- Optimized the full text search function by connecting Mongo DB and Elasticsearch.
- Utilized AWS framework for content storage and Elasticsearch for document search.
- Implemented capacity schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.
- Helped develop MapReduce programs and define job flows.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Managed and reviewed Hadoop log files.
- Supported/Troubleshooted MapReduce programs running on the cluster.
- Loaded data from Linux/UNIX file system into HDFS.
- Installed and Configured Hive and wrote Hive UDFs.
- Created tables, loaded data, and wrote queries in Hive.
- Monitored cluster using Ambari and optimize system based on job performance and criteria.
- Managed cluster through performance tuning and enhancements.
- Good Knowledge on cluster maintenance that includes commissioning & decommissioning of datanode, namenode recovery, capacity planning, slots configuration, YARN tuning, Hue High Availability, Load Balancing of both Hue and Hiveserver2.
- Hands on experience with different Hbase copying mechanisms like export/import, snapshots and copy table.
Environment: - Hadoop, Hive, MapReduce, kafka, MapR, Amazon Web Services (AWS), Ansible, NoSQL, HDFS, JAVA, UNIX, Redhat and CentOS.
Linux Administrator
Confidential
Responsibilities:
- Patching of RHEL5 and Solaris 8, 9, 10 servers for EMC Powerpath Upgrade for VMAX migration.
- Configuration of LVM (Logical Volume Manager) to manage volume group, logical and physical partitions and importing new physical volumes.
- Documented the standard procedure for installation and deployment of VMAX Migration and logical volume manager.
- Installation, configuration, support and security implementation on following services:DHCP, SSH, SCP.
- Configuration and administration of NFS and Samba in Linux and Solaris.
- Maintained and monitored all of company\'s servers' operating system and application patch level, disk space and memory usage, user activities on day-to-day basis.
- User administration on Sun Solaris and RHEL systems, HP-UX machines, management & archiving.
- Installations of HP Open view, monitoring tool, in more than 300 servers.
- Attended calls related to customer queries and complaints, offered solutions to them.
- Worked with monitoring tools such as Nagios and HP Openview.
- Creation of VMs, cloning and migrations of the VMs on VMware vSphere 4.0.
- Worked with DBA team for database performance issues, network related issue on Linux / Unix Servers and with vendors for hardware related issues.
- Expanded file system using Solaris Volume Manager (SVM) in the fly in Solaris boxes.
- Managed and upgraded UNIX's server services such as Bind DNS.
- Configuration and administration of Web (Apache), DHCP and FTP Servers in Linux and Solaris servers.
- Supported the backup environments running VERITAS Net Backup 6.5.
- Responsible for setting cron jobs on the servers.
- Decommissioning of the old servers and keeping track or decommissioned and new servers using inventory list.
- Handling problems or requirements as per the ticket (Request Tracker) created.
- Participated in on-call rotation to provide 24X7 technical supports.
- Configuration and troubleshooting - LAN and TCP/IP issues.
Environment: Red Hat Enterprise Linux 4.x, 5.x, Sun Solaris 8, 9, 10, VERITAS Volume Manager, Oracle 11G, Samba, Oracle RAC/ASM,EMC Power path, DELL PowerEdge 6650, HP Proliant DL 385, 585, 580, Sun Fire v440, SUN BLADE X6250, X6270.