Hadoop Administrator Resume
Oak Brook, IL
SUMMARY
- 8 years of experience in design, development and implementations of robust technology systems, with specialized expertise in Hadoop Administration and Linux Administration. Able to understand business and technical requirements quickly; Excellent communications skills and work ethics; Able to work independently.
- 4 years of experience in Hadoop Administration & Big Data Technologies and 4 years of experience into Linux administration.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks, Cloudera.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Design Big Data solutions for traditional enterprise businesses.
- Backup configuration and Recovery from a Name Node failure.
- Excellent command in creating Backups & Recovery and Disaster recovery procedures and Implementing Backup and recovery strategies for off - line and on-line Backups.
- Involved in bench marking Hadoop/HBase cluster file systems various batch jobs and workloads.
- Making Hadoop cluster ready for development team working on POCs.
- Experience in minor and major upgrades of Hadoop and Hadoop eco system.
- Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
- Good Experience in setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, swappiness, Selinux and installing Java.
- Good Experience in Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
- Installing and configuring Hadoop eco system like pig, hive.
- Hands on experience in Installing, Configuring and managing the Hue and HCatalog.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in importing and exporting the logs using Flume.
- Optimizing performance of Hbase/Hive/Pig jobs.
- Hands on experience in Zookeeper and ZKFC in managing and configuring in NameNode failure scenarios.
- Handsome experience in Linux admin activities on RHEL & Cent OS.
- Experience in deploying Hadoop 2.0 (YARN)
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) -EC2 and on private cloud infrastructure - Open Stack cloud platform.
TECHNICAL SKILLS
Hadoop Framework: Hdfs, Map Reduce, Pig, Hive, Hbase, Sqoop, Zookeeper, Oozie, Hue, Hcatalog, Storm, Kafka, Spark, Key Value Store Indexer, and Flume.
NoSQL Databases: Hbase,Programming Language Java, HTML
Microsoft: MS Office, MS Project, MS Visio, MS Visual Studio 2003/ 2005/ 2008
Databases: MySQL, Oracle 8i/9i/10g, SQL Server, PL/SQL Developer.
Operating Systems: Linux, Cent OS, RHEL, …
Scripting: Shell Scripting, HTML Scripting, puppet
Programming: C, C++, Core Java, PL/SQL.
WEB Servers: Apache Tomcat, JBOSS and Apache Http web server
Cluster Management Tools: HDP Ambari, Cloudera Manager, Hue, SolrCloud.
IDE: Net Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential - Oak brook, IL
Responsibilities:
- Involved in installing, configuring and using Hadoop Ecosystems with different Hadoop Distributors (Hortonworks, Cloudera)
- Responsible in administrating and maintaining Hadoop cluster HDP 2.x
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files.
- Configure and tune environment and batch jobs to ensure optimum performance and 99.99% availability.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Monitoring systems and services throughAmbari dashboard to make the clusters available for the business.
- Working on Ambari-alerts configuration for various components and managing the alerts.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Manage to coordinate with Development, Network, Infrastructure, and other organizations necessary to get work done.
- Worked with big data developers and designers in troubleshootingmapreducejob failures and issues with Hive and Pig.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Worked on Performance tuning forkafka cluster ( failure and success metrics)
- Built real time pipeline for streaming data using Kafka Streaming.
- Performed data extractions from SQL server, HAWQ, Hive, X15 to support audit/expedited requests.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map.
- Involved in development/implementation of redhatHadoop environment.
- Installation of OS on physical machines using UCS manager for Cisco servers.
- Creating Service profiles, service profile templates, binding to a template, associating the service profiles in UCS Manager.
- Opening Cisco TAC cases for troubleshooting the hardware issues.
- Worked with the applications team to install the operating systems, Hadoop updates, patches and version upgrades as required.
Environment: RedHatlinux 6.x, ESx 6.x, 5.x, Windows Server 2008, 2012, Hortonworks (HDP) 2.4, Ambari 2.2.1, Apache Hive 0.12.0, Apache zookeeper-3.4.5, HDFS, MapReduce, YARN, MySql, python, Hawq, Kafka 0.9, Spark 1.6, Cisco UCS. Frameflow, Automic.
Hadoop Administrator
Confidential - Minneapolis MN
Responsibilities:
- Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
- Experience in installing and configuring RHEL servers in Production, Test and Development environment and used them in building application and database servers.
- Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5.
- Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
- Experience in commissioning and decommissioning nodes of Hadoop cluster.
- Imported the data from relational databases into HDFS using Sqoop.
- Performed administration, troubleshooting and maintenance of ETL and ELT processes.
- Managing and reviewing Hadoop log files and supporting MapReduce programs running on the cluster.
- Involved in creating Hive tables, loading data, and writing Hive queries
- Developed PIG and HIVE scripting for data processing on HDFS.
- Experience in managing the cluster resources by implementing fair and capacity scheduler.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Work with the data center planning groups, assisting with network capacity and high availability requirements.
- Performed software installation, upgrades/patches, performance tuning and troubleshooting of all the servers in the clusters.
Environment: - RedHat Linux 6.x, CDH 5.0.6 based on Apache Hadoop 2.3.0, Apache Hive-0.12.0, Apache Pig -0.12.0, Apache ZooKeeper -3.4.5, HDFS, MapReduce, YARN, MySQL, Cloudera Manager.
Hadoop Administrator
Confidential, Minneapolis
Responsibilities:
- Responsible for architecting Hadoop cluster.
- Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load) and HiveQL
- Strong Experience in Installation and configuration of Hadoop ecosystem like Yarn, HBase, Flume, Hive, Pig, Sqoop.
- Expertise in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Manage and review Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively with Sqoop for importing data.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Extensively used Pig for data cleansing.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: RedHat Linux 6.x, Hive, Pig, oozie, sqoop, ZooKeeper, HDFS, MapReduce, YARN, Hbase, MySQL.
Hadoop Developer
Confidential, IA
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive. Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting. Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analyze large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Used Pig to store the data into HBase.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
- Used Pig to parse the data and Store in Avro format.
- Stored the data in tabular formats using Hive tables and Hive SerDes. Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data e
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential, Marlborough- MA
Hadoop Administrator
Responsibilities:
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Resource Manager, Node Manager and YARN / Map Reduce programming paradigm.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Extensively worked on commissioning and decommissioning of cluster nodes, replacing failed disks, filesystem integrity checks and maintaining cluster data replication.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Setting up HDFS Quotas to enforce the fair share of computing resources.
- Strong Knowledge in Configuring and maintaining YARN Schedulers (Fair and Capacity).
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Experience in setting up HBase cluster which includes master and region server configuration, High availability configuration, performance tuning and administration.
- Created user accounts and given users the access to the Hadoop cluster.
- Involved in loading data from UNIX file system to HDFS.
- Worked on ETL process and handled importing data from various data sources, performed transformations.
- Coordinate with QA team during testing phase.
- Provide application support to production support team.
- Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
- Monitored workload, job performance and capacity planning.
Environment: Cloudera, HDFS, Hive, Sqoop, Zookeeper and HBase, Windows 2000/2003, Unix Linux, Java, HDFS, Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Shell Scripting.
Linux Administrator
Confidential
Responsibilities:
- Installing and updating packages using YUM.
- Installing and maintaining the Linux servers.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recovery by implementing system and application level backups.
- Performed various configurations which include networking and IPTable, resolving host names and SSHkeyless login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Automate administration tasks using scripting and Job Scheduling using CRON.
- Monitoring System Metrics and logs for any problems.
- Running cron-tab to back up data.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Maintaining the MySQL server and Authentication to required users for databases.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
- Analyze existing automation scripts and tools for any missing efficiencies or gaps.
- Support internal and external teams in relation to information security initiatives
Environment: Redhat Linux, VMware, TCP/IP, Ubuntu, RAID, CRON, MySQL, Windows 2008 server.
