Sr. Hadoop Administrator Resume
Boston, MA
SUMMARY:
- Around 8 years of experience in IT with over around 4 years of hands - on experience as Hadoop Administrator.
- Hands on experience in deploying and managing multi-node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
- Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
- Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
- Designed and implemented database software migration procedures, and guidelines.
- Performed administrative tasks on Hadoop Clusters using Cloudera/HortonWorks.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Implemented DB2/LUW replication, federation, and partitioning (DPF).
- Areas of expertise and accomplishment include: Database Installation/Upgrade, Backup/Recovery,
- Hands on experience in installing, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
- Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services (AWS) using an EC2 instances.
- Having strong experience/expertise in different Data warehouse tools including ETL tools like Ab Initio, Informatica, etc. and BI tools like Cognos, Microstrategy, Tableau and Relational Database systems like Oracle/PL/SQL, Unix Shell scripting and Experience working on AWS EMR Instances.
- Good knowledge and experience in tuning the performance of Hadoop Clusters
- Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using Zookeeper and Quorum Journal Nodes.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem.
- Importing data from various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING for data analysis.
- Supported technical team for automation, installation and configuration tasks.
- Analysed the client's existing Hadoop infrastructure to understand the performance bottlenecks and provided performance tuning accordingly.
- Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
- Extensively involved in Test Plan re-design, Test Case re-Creation, Test Automation and Test Execution of web and client server applications as per change requests.
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie,Kafka,hortonwork.
Hadoop Distribution: Cloudera Distribution of Hadoop(CDH).
Operating Systems: Unix, Linux, Windows XP, Windows Vista, Windows 2003 Server, Weblogic server,WebSphere.
Programming Languages: Java, Pl SQL, Shell Script,perl, python .
Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub,Test NG, JUnit .
Database: MySQL, NoSQL,Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra,Oracle.
Processes: Incident Management, Release Management, Change Management
Office Tools: MS Outlook, MS Word, MS Excel, MS PowerPoint.
WORK EXPERIENCE:
Confidential, Boston, MA
Sr. Hadoop Administrator
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Expertise with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment
- Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics and Implemented Cassandra connector for Spark in Java.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Installed and configured Hadoop, MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and Upgradation Cloudera from 5.5 to 6.0 version.
- Automate repetitive tasks, deploy critical applications and manage change on several servers using Puppet.
- Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
- Troubleshot and rectified platform and network issues using Splunk / Wireshark.
- Extracted files from Cassendra Database through Sqoop and placed in HDFS and processed
- Cloudera Enterprise Navigator for Hadoop Audit files and Data Lineage.
- Created tables, secondary indexes, join indexes and views in Teradata development Environment for testing.
- Experience in methodologies such as Agile, Scrum, Test NG, JUnit, and Test driven development. working as Hadoop Administrator clusters with Hortonworks Distribution.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
- Created HD Insight cluster in Azure (Microsoft Specific tool) was part of the deployment and Component unit testing using Azure Emulator.
- Lead BigData Hadoop/YARN Operations and managed an off-shore team .
- Provided support on Kerberos related issues and Coordinated Hadoop installations/upgrades and patch installations in the environment.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Azure,Ambari 2.0, Linux Cent OS, HBase, Splunk, MongoDB,Teradata, Puppet,Kafka Cassandra, Ganglia and Cloudera Mana, Agile/scrum.
Confidential, Irving, TXHadoop Admin / Kafka
Responsibilities:
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Involved in capacity planning, with reference to the growing data size and the existing cluster size.
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQLdatabases, Flume, Oozie and Sqoop.
- Experience in designing, implementing and maintaining of high performing Bigdata, Hadoop clusters and integrating them with existing infrastructure.
- Used NoSQL database with Cassandra, MongoDB, Monod and Designed table architecture and developed DAO layer.
- Deployed the application and tested on Websphere Application Servers.
- Configured SSL for Ambari, Ranger, Hive and Knox.
- Experience in methodologies such as Agile, Scrum, and Test driven development.
- Creating principles for new users in the Kerberos and Implemented and maintained Kerberos cluster and integrated with the Active Directories (AD).
- Developed a data pipeline using Kafka and Storm to store data into Hdfs.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Involved in migrating java test framework to python flask.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
- Monitoring and analysing MapReduce jobs and look out for any potential issues and address them.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Moving the data from Oracle, Teradata, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally inMapReduce way.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Good knowledge in implementing Name Node Federation and High Availability of Name Node and HadoopCluster using Zookeeper and Quorum-Journal Manager.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Hands-On experience in setting up ACL (Access Control Lists) to secure access to the HDFS file system.
- Analyze escalated incidences within the Azure SQL database.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the TaskTrackers.
- Experience in tuning the heap size to avoid any disk spills and to avoid OOM issues.
- Familiar with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
- Experience managing users and permissions on the cluster, using different authentication methods.
- Involved in regular Hadoop Cluster maintenance such as updating system packages.
- Experience in managing and analysing Hadoop log files to look troubleshooting issues.
- Good knowledge in NoSQL databases, like HBase, MongoDB, etc.
- Working on Hadoop Hortonworks distribution which managed services viz. HDFS, MapReduce2
Environment: Hadoop, YARN, Hive, HBase, Flume,Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk,Java, Jenkins, GitHub, MySQL, Hortonwork,NoSQL,MongoDB,Java,Shell Script, python.
Confidential, Greenville, SCCloudera manager/ Hortonworks
Responsibilities:
- Installed and configured CDH5.0.0 cluster, using Cloudera manager.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Installation and configuration Hortonworks distribution HDP 1.3.2 and Cloudera CDH4.
- Developed scripts for benchmarking with Terasort / Teragen. worked on commission and decommission of data node.
- Ranger and Knox set up over all clusters.
- Upgraded Hortonworks distribution HDP 1.3.2 to HDP 2.2
- Cassendra database was use to transform queries to Hadoop HDFS.
- Installed and Setup Hadoop clusters for development and production environment using Cloudera CDH3, CDH4, Apache Tomcat & Hortonworks Ambari on, Python, Redhat,& Windows.
- Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization.
- Managing and reviewing Hadoop log files and debugging failed jobs.
- Tuned the cluster by Commissioning and decommissioning the Data Nodes.
- Supported cluster maintenance, Backup and recovery for production cluster.
- Backed up data on regular basis to a remote cluster using distcp
- Knowledge on supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. worked on data processing on AWS EC2 Cluster and Fine tuning of Hive jobs for better performance.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Collected and aggregated large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
Environment: Hadoop, HDFS, Map-Reduce, hive, Hortonworks, pig, Kafka,Oozie, Sqoop, Nagios, Cloudera Manager MySQL,NoSQL,MongoDB,Java.
Confidential, Wilmington, DEHadoop Administrator
Responsibilities:
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments. configured Hadoop, Hive and Pig on Amazon EC2 servers.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
- Configured MySQL Database to store Hive metadata.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Worked with Linux systems and MySQL database on a regular basis.
- Supported Map Reduce Programs those ran on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Monitoring cluster job performance and involved capacity planning
- Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
Environment: HDFS, Hive, Pig, sentry, Kerberos, LDAP, YARN, Cloudera Manager, and Ambari.
ConfidentialLinux Administrator
Responsibilities:
- Installation, Configuration, Upgradation and administration of Windows, Sun Solaris, RedHat Linux and Solaris.
- Linux and Solaris installation, administration and maintenance.
- User account management, managing passwords setting up quotas and support.
- Worked on Linux Kick-start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, and LDAP integration.
- Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
- Installation and configuration of MySql on Windows Server nodes.
- Responsible for configuring and managing Squid server in Linux and Windows.
- Configuration and Administration of NIS environment.
- Package and Patch management on Linux servers.
- Worked on Logical volume manager to create file systems as per user and database requirements.
- Data migration at Host level using Red Hat LVM, Solaris LVM, and Veritas Volume Manager.
- Expertise in establishing and documenting the procedures to ensure data integrity including system fail-over and backup/recovery in AIX operating system.
- Managed 100 + UNIX servers running RHEL, HPUX on Oracle HP and Dell server including blade centers.
- Solaris Disk Mirroring (SVM), ZONE installation and configuration
- Escalating issues accordingly, managing team efficiently to achieve desired goals.
Environment: Linux, TCP/IP, LVM, RAID, Networking, Security, user management, MySql.
ConfidentialLinux Administrator
Responsibilities:
- Installed and deployed RPM Packages.
- Storage management using JBOD, RAID Levels 0, 1, Logical Volumes, Volume Groups and Partitioning.
- Analysed the Performance of the Linux System to identify Memory, disk I/O and network problem.
- Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
- Administration of RedHat4.x, 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts, Xen servers.
- Logs & Resource Monitoring via Script on Linux Server.
- Maintained and monitored Local area network and Hardware Support and Virtualization on RHEL server (Through Xen & KVM Server).
- Administration of VMware virtual Linux server and resizing of LVM disk volumes as required.
- Respond to all Linux systems problems 24x7 as a part of on call rotation and resolving them on a timely basis.
Environment: Linux, TCP/IP, LVM, RAID, Networking, Security, user management.