Senior Hadoop Administrator Resume
Atlanta, GA
SUMMARY
- 9 Years of extensive IT experience with 5 years of experience as a Hadoop Administrator and 4 years of experience as Linux Administrator and Oracle Big Data Appliance.
- Experience in Hadoop Administration activities such as installation, configuration, and management of clusters in Cloudera (CDH), & Hortonworks (HDP) Distributions using Cloudera Manager & Ambari .
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, Hive, Impala, Sqoop, Pig, Oozie, Zookeeper, Spark, Solr, Hue, Flume, Accumulo, Storm, Kafka & Yarn distributions.
- Experience in Performance Tuning of Yarn, Spark, and Hive.
- Experience in Configuring Apache Solr memory for production system stability and performance.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop and troubleshooting for any issues.
- Experience in Monitoring and Configuring Oracle Exadata Database Machine
- Experience in performing backup and Disaster Recovery of NameNode metadata and important sensitive data residing on cluster.
- Experience in administrating Oracle Big Data Appliance to support (CDH) operations.
- Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Experience of SQL Server database administration, OLTP application.
- Knowledge of SQL Server performance tuning, backup and recovery methods.
- Good Understanding on NameNode HA architecture .
- Experience in designing, developing, and ongoing support of a data warehouse environments.
- Experience in monitoring the health of cluster using Ambari, Nagios, Ganglia and Cron jobs .
- Cluster maintenance and Commissioning /Decommissioning of data nodes.
- Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNodes and MapReduce concepts.
- Proficient in using SQL, ETL, Data Warehouse solutions and databases in a business environment with large - scale, complex datasets.
- Experience with Configuring Security in Hadoop using Kerberos / NTLM protocol.
- Implemented security controls using Kerberos principals, ACLs, Data encryptions using dm- crypt to protect entire Hadoop clusters.
- Experience in directory services like LDAP & Directory Services Database.
- Expertise in setting up SQL Server security for Active Directory and non-active directory environment using security extensions.
- Experience as a Business Intelligence, Systems Analyst with large, complex data sources.
- Assisted development team in identifying the root cause of slow performing jobs / queries.
- Expertise in installation, administration, patches, upgrade, configuration, performance tuning and troubleshooting of Red hat Linux, SUSE, CentOS, AIX, Solaris.
- Experience Schedule Recurring Hadoop Jobs with Apache Oozie.
- Experience in setting up Encryption Zones in Hadoop and worked in Data Retention.
- Experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.
- Experience in implementation and troubleshoot of cluster, JMS, JDBC.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
- Hands on practice in Implementing Hadoop security solutions such as LDAP, Sentry, Ranger, and Kerberos for securing Hadoop clusters and Data.
- Experience developing Java tools and utilities.
- Good knowledge in troubleshooting skills, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and network.
- Experience in administration activities of RDBMS data bases, such as MS SQL Server.
- Experience in Hadoop Distributed File System and Ecosystem (MapReduce, Pig, Hive, Sqoop, YARN and HBase).
- Planned, documented, and supported high availability, data replication, business persistent, fail-over, and fallback Solutions.
- Knowledge of NoSQL databases such as HBase, Cassandra, MongoDB.
- Strong analytical, diagnostics, troubleshooting skills to consistently deliver productive technological solutions.
- Provided 24/7 technical support to Production and development environments.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
TECHNICAL SKILLS
- MapReduce
- HDFS
- Pig
- Hive
- HBase
- Sqoop
- Zookeeper
- Oozie
- Hue
- Storm
- Kafka
- Solr
- Spark
- Flume
- Java
- Core Java
- HTML
- Programming C
- C++
- MySQL
- Oracle 8i/9i/10g/11g
- Oracle Server X6-2
- HBase
- NoSQL
- Linux (RHEL
- Ubuntu
- Open Solaris
- AIX
- JAVA
- Shell Scripting
- Bash Scripting
- HTML scripting
- Python
- Apache Tomcat
- JBOSS windows server2003
- 2008
- 2012
- LDAP
- Sentry
- Ranger and Kerberos: Cloudera Manager
- HDP Ambari
- Hue
- Cloudera certified Administrator for Apache Hadoop (CCA)
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Senior Hadoop Administrator
Responsibilities:
- Worked on Distributed/Cloud Computing (MapReduce/ Hadoop, Hive, Pig, HBase, Sqoop, Flume, Spark, Zookeeper, etc.), Hortonworks (HDP 2.5.0).
- Deploying, managing, and configuring HDP using Apache Ambari 2.4.2.
- Installing and Working on Hadoop clusters for different teams, supported 100+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Configuring YARN capacity scheduler with Apache Ambari.
- Configuring predefined alerts and automating cluster operations using Apache Ambari.
- Managing files on HDFS via CLI/Ambari files view. Ensure the cluster is healthy and available With monitoring tool.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
- Implemented Flume, Spark, Spark Stream framework for real time data processing.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Working with Talend to Loading data into Hadoop Hive tables and Performing ELT aggregations in Hadoop Hive and also Extracting data from Hadoop Hive.
- Responsible for services and component failures and solving issues through analyzing and troubleshooting the Hadoop cluster.
- Manage and review Hadoop log files. Monitor the data streaming between web sources and HDFS.
- Managing Ambari administration, and setting up user alerts.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Solving Hive thrift issues and HBase problems after upgrading HDP 2.4.0.
- Involved in projects Extensively on Hive, Spark, Pig, Sqoop and Gemfire XD throughout the development Lifecycle until the projects went into Production.
- Managing the cluster resources by implementing capacity scheduler by creating queues.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Performed Puppet, Kibana, Elastic Search, Talend, Red Hat infrastructure for data ingestion, processing, and storage.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Performed many complex system analyses to improve ETL performance, identified high critical batch jobs to prioritize.
- Implemented Spark solution to enable real time reports from Hadoop data. Was also actively involved in designing column families for various Hadoop Clusters.
Environment: HDP 2.5.0, Ambari 2.4.2, Oracle 11g/10g, MySQL, Sqoop, Teradata, Hive, Oozie, Spark, ZooKeeper, Talend, MapReduce, Apache NiFi, Pig, Kerberos, RedHat 7.
Confidential, Austin, TX
Senior Hadoop Administrator
Responsibilities:
- Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Cloudera Manager is installed on Oracle Big Data Appliance to help in (CDH) operations.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Upgraded the Hadoop cluster CDH5.8 to CDH 5.9.
- Worked on Installing cluster, Commissioning & Decommissioning of DataNodes, NameNode Recovery, Capacity Planning, and Slots Configuration.
- Creating collection within Apache Sol and Installing the Solr service through the Cloudera Manager installation wizard.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data in Hadoop and NoSQL with data in Oracle Database
- Maintains and monitors database security, integrity, and access controls. Provides audit trails to detect potential security violations.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
- Monitored cluster for performance, networking, and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Install OS and administrated Hadoop stack with CDH5.9 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Designing, developing, and ongoing support of a data warehouse environments.
- Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
Environment: MapReduce, Hive 0.13.1, PIG 0.16.0, Sqoop 1.4.6, Spark 2.1, Oozie 4.1.0, Flume, HBase 1.0, Cloudera Manager 5.9, Oracle Server X6, SQL Server, Solr, Zookeeper 3.4.8, Cloudera 5.8, Kerberos and RedHat 6.5.
Confidential
Hadoop Administrator
Responsibilities:
- Worked on Distributed/Cloud Computing (MapReduce/ Hadoop, Hive, Pig, HBase, Sqoop, Flume, Spark, Zookeeper, etc.), Hortonworks (HDP 2.4.0).
- Deploying, managing, and configuring HDP using Apache Ambari 2.4.2.
- Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Configuring YARN capacity scheduler with Apache Ambari.
- Configuring predefined alerts and automating cluster operations using Apache Ambari.
- Managing files on HDFS via CLI/Ambari files view. Ensure the cluster is healthy and available With monitoring tool.
- Developed Hive User Defined Functions in Python. Writing an Hadoop MapReduce Program in Python.
- Improved Mapper and Reducer code using Python iterators and generators
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
- Converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala
- Implemented Flume, Spark, Spark Stream framework for real time data processing.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Responsible for services and component failures and solving issues through analyzing and troubleshooting the Hadoop cluster.
- Manage and review Hadoop log files. Monitor the data streaming between web sources and HDFS.
- Working with Oracle XQuery for Hadoop oracle java hotspot virtual machines.
- Managing Ambari administration, and setting up user alerts.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Solving Hive thrift issues and HBase problems after upgrading HDP 2.4.0.
- Involved in projects Extensively on Hive, Spark, Pig and Sqoop throughout the development Lifecycle until the projects went into Production.
- Managing the cluster resources by implementing capacity scheduler by creating queues.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Performed Puppet, Kibana, Elastic Search, Tableau, Red Hat infrastructure for data ingestion, processing, and storage.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Implemented Spark solution to enable real time reports from Hadoop data. Was also actively involved in designing column families for various Hadoop Clusters.
Environment: HDP 2.4.0, Ambari 2.4.2, Oracle 11g/10g, Oracle Big Data Appliance, MySQL, Sqoop, Hive, Oozie, Spark, Zookeeper, Oracle Big Data SQL MapReduce, Pig, Kerberos, RedHat 6.5.
Confidential, Atlanta, GA
Hadoop Administrator
Responsibilities:
- Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4) distributions and on amazon web services (AWS).
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked on Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Setting an Amazon Web Services (AWS) EC2 instance for the Cloudera Manager server.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Updating the configuration on each host.
- Tested raw data and executed performance scripts.
- Configuring Cloudera Manager Agent heartbeat interval and timeouts.
- Shared responsibility for administration of Hadoop, Hive, and Pig.
- Implemented CDH3 Hadoop cluster on RedHat Enterprise Linux 6.4. Assisted with performance tuning and monitoring.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.
- Performed installation, upgrade and configure tasks for impala on all machines in a cluster.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Deploy Cloudera Platform on AWS using the Ansible playbook.
- Using Ansible to create instances on Amazon Web Services (AWS).
- Supported code/design analysis, strategy development and project planning.
- Assisted with data capacity planning and node forecasting.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration.
- Collaborated with the infrastructure, network, database, application, and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and HBase installing updates, patches, and upgrades.
- Performed both major and minor upgrades to the existing CDH cluster.
- Upgraded the Hadoop cluster from CDH3 to CDH4.
Environment: Hadoop, Hive, MapReduce, Cloudera, Amazon Web Services (AWS), Impala, Sqoop, NoSQL, UNIX, Red Hat Linux 6.4.
Confidential
Hadoop administrator
Responsibilities:
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Involved in design and ongoing operation of several Hadoop clusters.
- Implemented and operated on-premises Hadoop clusters from the hardware to the application layer including compute and storage.
- Implementing Oracle data integrator application adapter for Hadoop Cluster.
- Configured and deployed hive metastore using MySQL and thrift server.
- Designed custom deployment and configuration automation systems to allow for hands-off management of clusters via Cobbler, FUNC, and Puppet.
- Fully automated the configuration of firmware, the deployment of the operating system, and the configuration of the OS and applications resulting in a less than twenty-minute server deployment time.
- Prepared complete description documentation as per the Knowledge Transferred about the Phase-II Talend Job Design and goal.
- Using Oracle big data connectors like SQL connector for HDFS, oracle loader for Hadoop.
- Prepared documentation about the Support and Maintenance work to be followed in Talend.
- Deployed the company's first Hadoop cluster running Cloudera's CDH2 to a 44-node cluster storing 160TB and connecting via 1 GB Ethernet.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Involved in Cluster Maintenance and removal of nodes using Cloudera Manager.
- Collaborated with application development teams to provide operational support, platform expansion, and upgrades for Hadoop Infrastructure including upgrades to CDH3.
- Participated in Hadoop development Scrum.
- Installed, Configured Cognos8.4/10 and Talend ETL on single and multi-server environments.
- Member of internal standards bodies addressing network and server strategic direction and architecture.
- Responsible for maintaining Linux platform build standards including all aspects of OS configuration and deployment.
- Submitted improvements to the Solaris standard build.
- Wrote documentation and mentored other System Administrators.
Environment: Apache Hadoop, Cloudera, Pig, Hive, Talend, Map-reduce, Sqoop, UNIX, Cassandra, Java, LINUX, Oracle 11gR2, UNIX shell scripting, Kerberos.
Confidential
Linux/System Administrator
Responsibilities:
- Supported a large server and network infrastructure of Solaris, Linux, and Windows environment.
- Managed installation, configuration, upgrades, and patch systems running OS such as Red Hat, Fedora, CentOS, and Oracle Solaris.
- Utilized bash and ksh shell scripting to automate daily system administration tasks.
- Researched to improve service performance and reliability through investigation and root cause analysis.
- Managed data backup of UNIX, Windows, Virtual servers, disk storage tier1 and tier2 backups.
- Copy of important backup images to tape media and send them to offsite every week. Performed quarterly offsite media audits and reports.
- Ensured that various backup life cycle policies, and daily backup jobs are running, and failed jobs are fixed.
- Troubleshoot of technical issues related to tier 3 Storage and Quantum tape libraries, reported, and logged all media and drive errors. Worked with vendors to resolve hardware and software issues.
- Configuration of NDMP backup and troubleshoot of NDMP backup failures associated with storage.
- Maintained configuration and security of the UNIX/LINUX operations systems with the enterprise's computing environment. Provided required evidences to support internal controls per SOX quarterly audit requirement.
- Monitored system activities and fine-tuned system parameters and configurations to optimize performance and ensure security of systems.
- Adding servers to domain and managing the groups and user in Active Directory
- Custom build of Windows 2003 and Windows 2008 servers which includes adding users, SAN, network configuration, installing application related packages, managing services.
- Responsible for maintenance of development tools and utilities and to maintain shell, Perl automation Scripts.
- Worked with project manager and auditing teams to implement PCI compliance.
- Installed and configured Virtual I/O Server V1.3.01 with fixpack8.1.
- Integrating WebLogic 10.x and Apache 2.x and deploying EAR, WAR files in WebLogic Application servers.
- As a member of the team, monitored the VERITAS Cluster Server 4.1 in SAN Environment.
- Created new groups and tested first in development, QA Boxes and then implemented the same in production Boxes.
- Created and maintained detailed procedural documentation regarding operating system installation and configuration of software packages, vendor contact information, etc.
Environment: Solaris 10, RHEL 5/4, Windows 2008/2003, Sun SPARC and Intel Servers, VMware Infrastructure. Red Hat Linux Enterprise Linux 4/5 4, Solaris 9, 10, Sun E10k, E25K, E4500, SunFire V440/880, DMX 3 & DMX4, SiteMinder, SonicMQ 7.0, VxFS 4.1, VxVM 4.1.
Confidential
Linux System Administrator
Responsibilities:
- Installation, configuration and administration of Red Hat Linux servers and support for Servers and regular upgrades of Red Hat Linux Servers using kick start based network installation.
- Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x servers and resolved trouble tickets on shift rotation basis.
- Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and IBM p-series machines, for production, staging and test environments.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, 5.7.
- Performance monitoring utilities like IOSTAT, VMSTAT, TOP, NETSTAT and SAR.
- Worked on Support for Aix matrix sub system device drivers.
- Worked on with the computing by both physical and virtual from the desktop to the data center using the SUSE Linux. Expertise in Build, Install, load and configure boxes.
- Worked with the team members to create, execute, and implement the plans.
- Experience in Installation, Configuration, and Troubleshooting of Tivoli Storage Manager.
- Remediating failed backups, take manual incremental backups of failing servers.
- Upgrading TSM from 5.1.x to 5.3.x. Worked on HMC Configuration and management of HMC Console which included up gradation, micro partitioning.
- Installation of adapter cards cables and configuring them. Worked on Integrated Virtual Ethernet and building up of VIO servers.
- Install SSH Keys for Successful login of SRM data into the server without prompting password for daily backup of vital data such as processor utilization, disk utilization, etc.
- Provide redundancy with HBA card, Ether channel configuration and network devices.
- Coordinating with application and database team for troubleshooting the application.
- Coordinating with SAN team for allocation of LUN's to increase file system space.
- Configuration and administration of Fiber Card Adapter's and handling AIX part of SAN.
Environment: Red Hat Linux (RHEL 3/4/5), Solaris 10, Logical Volume Manager, Sun & Veritas Cluster Server, VMWare, Global File System, Red hat Cluster Servers