Senior Hadoop Administrator Resume
Austin, TX
SUMMARY:
- Certified Hadoop Administrator with Over 7 years of expertise in Hadoop, Big Data Analytics and Linux including architecture, design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
- Experience to understand Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Confidential .
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports requirements. Solid understanding of OLAP concepts and challenges, especially with large data sets.
- Experience in integration of various data sources like Oracle, DB2, Sybase, SQL server and MS access and non - relational sources like flat files into staging area.
- Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining.
- Involved in every functionality of Hadoop daemon, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Expertise in Installation and configuration of Linux server (RadHat and Ubuntu), Managing Running Jobs, Scheduling Hadoop map reduce Jobs.
- Experience in managing the Hadoop cluster with IBM Big Insights, Horton works Distribution Platform.
- Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Experience in tuning large / complex SQL queries and manage alerts from PDW and Hadoop.
- Decommissioning and commissioning the Node on running Hadoop cluster.
- Good Knowledge about AWS and managing AWS server EC2, S3 Storage.
- Experience in managing the Hadoop infrastructure with Cloudera Manager.
- Expertise in Kerberos and it's interaction with Hadoop and LDAP.
- Experience in understanding and managing Hadoop Log Files.
- Experience in Adding and removing the nodes in Hadoop Cluster.
- Experience in Change Data Capture (CDC) data modeling approaches.
- Experience in extracting the data from RDBMS into HDFS Sqoop.
- Experience in collecting the logs from log collector into HDFS using up Flume.
- Experience in setting up and managing the batch scheduler Oozie.
- Good understanding of No SQL databases such as HBase, Neo4j and Mongo DB.
- Experience in analyzing data in HDFS through Map Reduce, Hive and Pig.
- Design implement and review features and enhancements to Cassandra.
- Deployed a Cassandra cluster in cloud environment as per the requirements.
- Hands on expertise in administration and maintenance of Oracle Databases Database 10g/11g/12c.
- Experience on UNIX commands and Shell Scripting.
- Experience in Python Scripting.
- Experience in statistics collection and table maintenance on MPP platforms.
- Experience in create physical data models for data warehousing.
TECHNICAL SKILLS:
Hadoop /Big Data Technologies: Hadoop 2.4.1, HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Confidential, Flume, Zookeeper, Spark, Cassandra, Storm, MongoDB, Pig, Hue, Impala, Whirr, Kafka Mahout and Oozie
Programming Languages: Java, SQL, PL/SQL, Shell Scripting, Python, Perl
Frameworks: MVC, Spring, Hibernate.
Tools: & Utilities
RMAN, Oracle enterprise Manager(OEM), VMware, Vi Editor, SQL Developer
Databases: Oracle 9i/10g/11g, SQL Server, MYSQL
Database Tools: TOAD, Chordiant CRM tool, Billing tool, Oracle Warehouse Builder (OWB)
Operating Systems: Linux, Unix, Windows, Mac, CentOS
Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, ETL
EXPERIENCE:
Confidential, Austin, TX
Senior Hadoop Administrator
Responsibilities:
- Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Cloudera Manager is installed on Oracle Big Data Appliance to help in (CDH) operations.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Upgraded the Hadoop cluster CDH5.8 to CDH 5.9.
- Worked on Installing cluster, Commissioning & Decommissioning of DataNodes, NameNode Recovery, Capacity Planning, and Slots Configuration.
- Creating collection within Apache Sol and Installing the Solr service through the Cloudera Manager installation wizard.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL with data in Oracle Database
- Maintains and monitors database security, integrity, and access controls. Provides audit trails to detect potential security violations.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
- Monitored cluster for performance, networking, and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Install OS and administrated Hadoop stack with CDH5.9 (with Confidential ) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Designing, developing, and ongoing support of a data warehouse environments.
- Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
Environment: MapReduce, Hive 0.13.1, PIG 0.16.0, Sqoop 1.4.6, Spark 2.1, Oozie 4.1.0, Flume, HBase 1.0, Cloudera Manager 5.9, Oracle Server X6, SQL Server, Solr, Zookeeper 3.4.8, Cloudera 5.8, Kerberos and RedHat 6.5.
Confidential, San Jose, CA
Senior Hadoop Administrator
Responsibilities:
- Did architect, design, installation, configuration and management of Apache Hadoop, Hortonworks Distribution.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
- Monitor the data streaming between web sources and HDFS.
- Worked in Kerberos and how it interacts with Hadoop and LDAP.
- Worked on Kafka distributed, partitioned, replicated commit log service and provides the functionality of a messaging system.
- Involved in close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Commissioned and Decommissioned the Hadoop nodes & Data Re-balancing.
- Gave inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Involved in software intermediary that makes it possible for application programs to interact with each other and share data.
- Worked extensively with Amazon Web Services and Created Amazon Elastic Map Reduce cluster in both 1.0.3 and 2.2.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Presented Demos to customers how to use AWS and how it is different from traditional systems.
- Involved in implementation of REST that exposes specific software functionality while protecting the rest of the application in API.
- Experience in Nagios and writing plugins for Nagios to perform the multiple server checks.
- Did changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Set up Identity, Authentication, and Authorization.
- Maintained Cluster to remain healthy and in optimal working condition.
- Handle the upgrades and Patch updates.
- Worked in UNIX commands and Shell Scripting.
- Involved in Python Scripting.
- Worked in core competencies in java, HTTP, XML and JSON.
- Worked on spark it’s a fast and general - purpose clustering computing system.
- Worked on Storm its distributed real-time computation system provides a set of general primitives for doing batch processing.
- Balanced HDFS manually to decrease network utilization and increase job performance.
- Commission and decommission the Data nodes from cluster in case of problems.
- Set up automated processes to archive/clean the unwanted data on the cluster, on Name node and Secondary name node.
- Set up and manage High Availability Name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters.
- Involved in a Web-based Git repository hosting service, which offers all the distributed revision control and source code management (SCM)functionality of Git as well as adding its own features in Git Hub.
- Discuss with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential
Hadoop Administrator
Responsibilities:
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Worked in a fully managed peta byte-scale data warehouse service and it is designed for analytic workloads and connects to standard SQL-based clients and business intelligence tools in Red shift.
- Involved in architecture, designed, installation, configuration and management of Apache Hadoop, Hortonworks Distribution Platform (HDP).
- Worked in delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes in Red shift.
- Worked on Storm its distributed real-time computation system provides a set of general primitives for doing batch processing.
- Experience in Horton works Distribution Platform(HDP)cluster installation and configuration.
- Experience in Kerberos, Active Directory/LDAP, Unix based File System.
- Load data from various data sources into HDFS using Flume.
- Worked in statistics collection and table maintenance on MPP platforms.
- Worked on Cloudera to analyze data present on top of HDFS.
- Worked extensively on Hive and PIG.
- Worked on Kafka distributed, partitioned, replicated commit log service and provides the functionality of a messaging system.
- Worked on spark it’s a fast and general - purpose clustering computing system.
- Wrote code in Python or Shell Scripting.
- Involved in Source Code Management tools and proficient in GIT, SVN, Accurev.
- Involved in Test Driven Development and wrote the test cases in JUnit.
- Worked on large sets of structured, semi-structured and unstructured data.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Worked in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
- Involved in Change Data Capture (CDC) data modeling approaches.
- Coordinated with technical team for production deployment of software applications for maintenance.
- Read data from Cassandra and writing to it.
- Provided operational support services relating to Hadoop infrastructure and application installation.
- Handled the imports and exports of data onto HDFS using Flume and Sqoop.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Participated in development and execution of system and disaster recovery processes.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
- Set up automated processes to send alerts in case of predefined system and application level issues.
- Set up automated processes to send notifications in case of any deviations from the predefined resource utilization.
Environment: RadHat Linux/Centos 4, 5, 6, Logical Volume Manager, Hadoop, VMware ESX 5.1/5.5, Apache and Tomcat Web Server, Oracle 11,12, Oracle Rac 12c, HPSM, HPSA.
Confidential
Oracle DBA
Responsibilities:
- End to end involvement of Installation, Patching, Cloning, Upgrade, Autoconfigure, AD Utilities, Platform Migration and NLS Implementation.
- Applied patches to the 11i and R12 instance.
- Involved in setting up cron jobs for monitoring the database.
- Did Installation of E-Business suit R12 for the Dev/UAT/CRP1 instance.
- Did Installation of Single node/ Multi node on Linux, Solaris
- Involved in Reconstruction of databases in development environment.
- Involved in Installation of software, creation databases and decommission of databases.
- Upgrade the databases and applied patches.
- Managed the tablespaces (adding, resizing the tablespaces).
- Checked CRON Jobs and rescheduling process and monitoring.
- Created the users and gave necessary privileges as per the application request.
- Hands on involvement in administration of RAC databases.
- Involved in cluster failure issue in RAC environment.
- Performed reorganization of tablespaces and refresh of databases.
- Improved the db. Performance, reconstructed a critical database.
- Renamed the users as per the application users request.
- Generated the AWR and stats pack reports to diagnose the database performance.
- Managed the export, hot and RMAN backups.
- Implemented backup/restore procedures in ARCHIVELOG mode
- Knowledge on administration of RAC databases.
- Involved in cluster failure issue in RAC environment.
- Performed reorganization of tablespaces and refresh of databases.
- Improved the db performance, reconstructed a critical database.
- Involved in renaming the users as per the application users request.
- Generated the AWR and stats pack reports to diagnose the database performance.
- Managed the export, hot and RMAN backups.
- Implemented backup/restore procedures in ARCHIVELOG mode.
Environment: Linux, Solaria, Hp-Unix & Aix
Confidential
Linux Administrator
Responsibilities:
- Installed and upgraded OE & Red hat Linux and Solaris 8/ & SPARC on Servers like HP DL 380 G3, 4 and 5 & Dell Power Edge servers.
- Involved in LDOM’s and Creating sparse root and whole root zones and administered the zones for Web, Application and Database servers and worked on SMF on Solaris 10.
- Worked in AWS Cloud Environment like EC2 & EBS.
- Implemented and administered VMware ESX 3.5, 4.x for running the Windows, Centos, SUSE and Red hat Linux Servers on development and test servers.
- Installed and configured Apache on Linux and Solaris and configured Virtual hosts and applied SSL certificates.
- Implemented Jumpstart on Solaris and Kick Start for Red hat environments.
- Worked with HP LVM and Red hat LVM.
- Implemented P2P and P2V migrations.
- Involved in Installing and configuring Centos & SUSE 11 & 12 servers on HP x86 servers.
- Implemented HA using Red Hat Cluster and VERITAS Cluster Server 5.0 for Web Logic agent.
- Managed DNS, NIS servers and troubleshooting the servers.
- Troubleshoot application issues on Apache web servers and database servers running on Linux and Solaris.
- Involved in migrating Oracle, MYSQL data using Double take products.
- Used Sun Volume Manager for Solaris and LVM on Linux & Solaris to create volumes with layouts like RAID 1, 5, 10, 51.
- Re-compiled Linux kernel to remove services and applications that are not required.
- Performed performance analysis using tools like prstat, mpstat, iostat, sar, vmstat, truss, Dtrace.
- Worked on LDAP user accounts and configuring ldap on client machines.
- Upgraded Clear-Case from 4.2 to 6.x running on Linux (Centos &Red hat)
- Worked on patch management tools like Sun Update Manager.
- Support middle ware servers running Apache, Tomcat and Java applications.
- Worked on day to day administration tasks and resolve tickets using Remedy.
- Used HP Service center and change management system for ticketing.
- Worked on the administration of the Web Logic 9, JBoss 4.2.2 servers including installation and deployments.
- Worked on F5 load balancers to load balance and reverse proxy Web Logic Servers.
- Used Shell scripting to automate the regular tasks like removing core files, taking backups of important files, file transfers among servers.
Environment: Solaris 8/9/10, Hp-Unix, Linux, & Aix Server Veritas Volume Manager, web servers, LDAP directory, Active Directory, BEA Web logic servers, SAN Switches, Apache, Tomcat servers, Web Sphere application server.