Hadoop Administrator Resume
Los Angeles, CA
SUMMARY
- Over all 8 years of working experience, including with 4 years of experience as a Hadoop Administrator and along with around 4 years of experience in Linux admin related roles.
- As a Hadoop Administrator responsibility include software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on dally basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks& Cloudera)
- Experience in installation, management and monitoring of Hadoop cluster using Apache, Cloudera Manager.
- Optimized the configurations of Map Reduce, Pig andHive jobs for better performance.
- Advanced understanding in Hadoop Architecture such as HDFS, Yarn.
- Strong experience configuring Hadoop Ecosystem tools with including Pig, Hive, Hbase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark and Storm.
- Experience in designing, Installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks and Cloudera.
- Installed and configured monitoring tools Munin and NagiOS for monitoring the network bandwidth and the hard drives status.
- Experience in Data Meer as well as big data Hadoop. Experienced in NoSQL databases such as HBase, and MongoDB. Store and manage the data coming from the users in Mongo DB database.
- Good experience in installation/upgradation of VMware.Automated server building using System Imager, PXE, Kickstart and Jumpstart.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developers/business users for day - to-day activities.
- Have experience in 15 nodeclusters step up in Ubuntu Environment.
- Expert level understanding of the AWS cloud computing platform and related services.
- Experience in managing the Hadoop Infrastructure with Cloudera Manager and Ambari.
- Working experience on importing and exporting data into HDFS and Hive using Sqoop.
- Working experience on import & export of data using ETL tool Sqoop from MySQL to HDFS.
- Working experience on ETL Data Integration Tool Talend.
- Good knowledge of computer applications and scripting like Shell, Python, Power Shell and Groovy.
- Strong Knowledge in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Experience in Backup configuration and Recovery from a Name Node failure.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- Experienced with Devops tools like Chef, Puppet, Ansible, Jenkins, Jira, Docker and Splunk.
- Involved in Cluster maintenance, bug fixing, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Management of security in Hadoop Clusters using Kerberos, Ranger, Knox, Acl's.
- Excellent experience in Shell Scripting.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, HCatalog, Phoenix, Falcon, Scoop, Zookeeper,Nifi, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm.
Hadoop Distribution: Hortonworks and Cloudera
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, Weblogic Server, WebSphere
Security: Kerberos
Reporting Tools: Cognos, Hyperion Analyzer, OBIEE & BI+
Analytic Tools: Elasticsearch-Log stash-Kibana
Automation tools: Puppet, chef, Ansible
PROFESSIONAL EXPERIENCE
Confidential - Los Angeles, CA
Hadoop Administrator
Responsibilities:
- Hadoop installation, Configuration of multiple nodes using Hortonworks platform.
- Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line.
- Involved in cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
- Handling the installation and configuration of a Hadoop cluster.
- Worked on setting up Apache Nifi and used Nifi in orchestrating data pipeline.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Analyzed data using RStudio.
- Created HDInsight,Nifi VM, spark cluster,blob storages and lakes on Azure cloud.
- Involved in deploying LLAP cluster, providing my inputs and recommendations with Hive LLAP on HDInsight.
- Worked with Unravel Support team to build the script to perform Auto Scaling HDInsight Spark Cluster based on peak and non-peak hours
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Day-to-day operational support of our Hortonworks Hadoop clusters in lab and production, at multi-petabyte scale.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Installed and configured Revolution R and RStudio Server and integrated with Hadoop Cluster.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: Ubuntu 16.04.4 LTS, Azure HDInsight 3.5/3.6, HDP 2.2, Ambari, Hive, Hive LLAP, Tez, Kafka, HBase, Python, Zookeeper, LDAP, Jenkins, GitHub, Docker, Kubernetes, Kerberos, Knox, Ranger.
Confidential - Malvern, PA
Hadoop Administrator
Responsibilities:
- Currently working as Hadoop administrator managing multi node Hortonworks HDP clusters (HDP
- 2.6.0.3/2.4.2 ) distributions for 3 clusters for Dev, Pre Prod and PROD environments with 200+ nodes With overall storage capacity of 5 PB.
- Day to day responsibilities include monitoring and troubleshooting incidents resolving developer issues & Hadoop eco system run time failures, enabling security policies, managing data storage and compute resources.
- Responsible for Cluster maintenance, Cluster Monitoring, Troubleshooting, Manage and & review log files and provide 24X7 on call support with scheduled rotation.
- Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
- Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
- Responsibilities include implementing change orders for creating hdfs folders, hive DB/tables, hbase
- Namespace/commissioning and decommissioning Data nodes, troubleshooting, manage and review data backups, manage & review log files.
- Implemented HDP upgrade from 2.4.2 to 2.6.0.3 version.
- Implemented High Availability for Namenode/Resource Manager/Hbase/Hive/Knox Services.
- Installing, configuring new hadoop components and upgrading the cluster with proper strategies which include ATLAS/Phoenix/Zeppelin.
- Diligently teaming with the infrastructure, network, database and application teams to guarantee high data quality and availability.
- Aligning with the systems engineering team to propose and help deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Analyze the Performance of the Linux System to identify Memory, disk I/O and network problem.
- Troubleshoot issues with hive, hbase, pig, spark /scala scripts to isolate /fix issues.
- Screen Hadoop cluster job performances and capacity planning.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
Environment: Hortonworks HDP 2.6.0.3, Hbase, Hive, Hbase, Ambari 2.5.0.3, Linux, Azure Cloud.
Confidential - San Ramon, CA
HadoopAdministrator
Responsibilities:
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Resource Manager, Node Manager and YARN / Map Reduce programming paradigm.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Extensively worked on commissioning and decommissioning of cluster nodes, file system integrity checks and maintaining cluster data replication.
- Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
- Responsible for efficient operations of multiple Cassandra clusters
- Implemented Python script which calculates the cycle time from the Rest API and fix the wrong cycle time data in Oracle database.
- Involved in developing new work flow Map Reduce jobs using Oozie framework.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Involved and experienced in Cassandra cluster connectivity and security.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Setting up HDFS Quotas to enforce the fair share of computing resources.
- Strong Knowledge in Configuring and maintaining YARN Schedulers (Fair, and Capacity)
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Integrated ApacheStormwith Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating withStorm.
- Experience in projects involving movement of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling.
- Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
- Support for parallel data load into Hadoop.
- Involved in setting up HBase which includes master and region server configuration, High availability configuration, performance tuning and administration.
- Created user accounts and provided access to the Hadoop cluster.
- Upgraded cluster from CDH 5.3 to CDH 5.7 and Cloudera manager from CM 5.3 to 5.7.
- Involved in loading data from UNIX file system to HDFS.
- Worked on ETL process and handled importing data from various data sources, performed transformations.
Environment: Hadoop, Map Reduce, Shell Scripting, spark, Pig, Hive, HDFS, Yarn, Hue, Sentry, Oozie, Zoo keeper, Impala, Solr, Kerberos, cluster health, Puppet, Ganglia, Nagios, Flume, Sqoop, storm, Kafka, KMS
Confidential - San Jose, CA
Hadoop Administrator
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs for data cleaning.
- Involved in clustering of Hadoop in the network of 70 nodes.
- Experienced in loading data from UNIX local file system to HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on monitoring of VMware virtual environments with ESXi 4 servers and Virtual Center. Automated tasks using shell scripting for doing diagnostics on failed disk drives.
- Configured Global File System (GFS) and Zetta byte File System (ZFS). Troubleshooting production servers with IPMI tool to connect over SOL.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETLtool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
Environment: s: Hadoop, HDFS, Map Reduce, Impala, Sqoop, HBase, Hive, Flume, Oozie, Zoo keeper, solr, Performance tuning, cluster health, monitoring security, Shell Scripting, NoSQL/HBase/Cassandra, Cloudera Manager.
Confidential
Linux/Unix Systems Administrator
Responsibilities:
- Day - to-day administration on Sun Solaris, RHEL 4/5 which includes Installation, upgrade & loading patch management & packages
- Responsible for monitoring overall project and reporting status to stakeholders.
- Identify repeated issues in production by analyzing production tickets after each release and strengthen the system testing process to arrest those issues moving to production to enhance customer satisfaction
- Designed and coordinated creation of Manual Test cases according to requirement and executed them to verify the functionality of the application.
- Manually tested the various navigation steps and basic functionality of the Web based applications.
- Experience interpreting physical database models and understanding relational database concepts such as indexes, primary and foreign keys, and constraints using Oracle.
- Writing, optimizing, and troubleshooting dynamically created SQL within procedures
- Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
- Responsible for creating virtual environments for the rapid development.
- Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues User management like adding, modifying, deleting, grouping
- Responsible for preventive maintenance of the servers on monthly basis. Configuration of the RAID for the servers. Resource management using the Disk quotas.
- Responsible for change management release scheduled by service providers.
- Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.
- Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in Linux environment.
- Identifying operational needs of various departments and developing customized software to enhance System's productivity.
- Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
- Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using Linux, Apache, MySQL Database backend.
- Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers. Monitoring System Metrics and logs for any problems.
Environment: UNIX, Solaris, HP UX, Red Hat Linux, Windows, FTP, SFTP