Hadoop Admin Resume
4.00/5 (Submit Your Rating)
SUMMARY
- Over 6 years of exclusive experience as Hadoop Admin / Developer & Linux Administrator with Good working knowledge in Big data and Hadoop ecosystem related technologies.
- Excellent Experience in Hadoop architecture and various components such as Name Node, Data Node, MapReduce, YARN (NODE MANAGER, RESOURCE MANAGER, APPLICATION MASTER) and tools including Hive for data analysis, Sqoop for data migration, Oozie for scheduling and Zookeeper for coordinating cluster resources.
- Implemented Proof of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e. Oracle, MySQL) to Hadoop.
- Involved in Design and Development of technical specifications using Hadoop Echo System. Administration, Testing, Change Control Process, Hadoop administration activities such as installation, configuration and maintenance of clusters.
- Hands - on experience in designing and implementing solutions using Hadoop, HDFS, YARN, Hive, Impala, Oozie, Sqoop, Zookeeper.
- Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, HCATALOG, ZOOKEEPER) using Hortonworks Ambari.
- Theoretical knowledge of CDP (Data Center, Data Hub)
- Experience with Hadoop security tools Kerberos, Ranger on HDP 2.x stack and CDH 5.x.
- Expertise in setting, configuring & monitoring of Hadoop cluster using Cloudera CDH5, Apache Hadoop on RedHat & Windows.
- Experienced in Data Ingestion projects to inject data into Data lake using multiple sources systems using Talend Bigdata and Open Studio.
- Hands on experience on Installing multi-node production Hadoop environments with Amazon AWS and AWS technologies such as EMR, EC2, IAM, S3, Data Pipeline etc.
- Experience in Importing and exporting data from different databases into HDFS and hive using Sqoop.
- Experience in analyzing Log files and finding the root cause and then involved in analyzing failures, identifying root causes and taking / recommending course of actions.
- Strong knowledge in configuring High Availability for Name Node, Hive and Resource Manager.
- Good experience in UNIX/LINUX Administrator along with SQL development in designing and implementing Relational Database model as per business needs in different domains.
- Expertise in Commissioning, decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- Experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, Troubleshooting.
- Expertise in benchmarking, performing backup and disaster recovery of Name Node metadata and important and sensitive data residing on cluster.
- Knowledge of Cloudera architecture and components a must including Director, Navigator, Manager within both cloud and on-premise environments including securitizing services and users with Kerberos, LDAP/Active Directory.
- Rack awareness configuration for quick availability and processing of data. Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- Experience in copying files with in cluster or intra - cluster using DistCp command line utility.
- Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
- Experience in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Having Experience in developing Oozie workflows and Job Controllers for job automation - Hive automation and scheduling jobs in HUE Browser.
- Experience in working on ITIL tools JIRA,SUCCEED and SERVICE-NOW tools for change management and support processes.
- Additional responsibilities include interacting with offshore team on a daily basis, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
TECHNICAL SKILLS
Hadoop / Big Data Ecosystems: HDFS, YARN, MapReduce, Hive, Sqoop, Impala, Hue, Oozie, Zookeeper, Apache Tez, Spark, HBase.
Operating Systems: Linux (Redhat), Windows.
Databases: Oracle, SQL Server, MySql.
Programming Language: SQL.
Cloud Computing Services: VMware, AWS.
Security: Kerberos, Active Directory, Sentry, Apache Ranger, TSL/SSL.
Data pipeline: Talend ETL tool and AWS Data Pipeline.
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Admin
Responsibilities:
- As part of the core Hadoop team where I am working on Hadoop infrastructure and playing a key role in supporting Hadoop cluster.
- Utilizing components such as Yarn, Zookeeper, Journal Nodes, Scoop, Hive, Impala, Hue, Hbase.
- Worked with systems engineering team to plan and deployed new Hadoop environments and expand existing Hadoop clusters.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons, AD integration (LDAP) and Sentry authorization.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hbase and Hive.
- Ability to work with data consistency, data integrity and experience with real-time transactional data. Strong collaborator and team player with an Agile hand on experience on Impala.
- Upgraded Cloudera CM and CDH stack from 5.13.0 versions to latest versions of 5.14.4.
- Design and Configure the Cluster with the services required (Sentry, Hive server2, Kerberos, HDFS, Hue, Hive, Zookeeper).
- Sentry configuration for appropriate user permissions accessing Hive server2/beeline.
- Design and maintain the Name node and Data nodes with appropriate processing capacity and disk space.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Cloudera Distribution.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
- For Design, build, testing, data integration & data management we do use Talend ETL Tool. Talend specializes in the big data integration.
- Experience in Importing and Exporting data using Sqoop from HDFS TO RDBMS and vice-versa.
- Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
- Configured Resource management in Hadoop through dynamic resource allocation.
- Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Installed Oozie workflow engine to run multiple Hive Jobs.
- Application development using RDBMS, and Linux shell scripting.
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH 5.X in a Multi Clustered environment
- Set up and configured TLS/SSL on Hadoop cluster ranging from different levels of 1,2,and 3 using keystore and truststore.
- Enable TLS on HDFS, Yarn and Hue services.
- Set up KDC Kerberos trust between multiple Hadoop clusters and tested validated adding peers and business development representative (BDR) job set up.
- HDFS encryptions on Hadoop cluster using KMS/KEYTRUSTEE .
Confidential
Hadoop Admin
Responsibilities:
- Deploying a hadoop cluster, maintaining a hadoop cluster, adding and removing nodes using cluster monitoring tools Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running hadoop jobs.
- Monitoring for deciding the size of the hadoop cluster based on the data to be stored in HDFS.
- Working with open source Apache Distribution then as a hadoop admin have to manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site.
- Handled data processing & incremental updates using Hive & process the data in hive table.
- Develop and establish strong relationships with end users and technical resources throughout established projects.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Extensively worked in ETL process consisting of Data Analysis, Data Modeling, Data Cleansing, Data Transformations, Integration, Migration, Data import/Export, Mapping, Conversion.
- Enable TLS on HDFS, Yarn and Hue services.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Experience in writing UNIX Shell scripts for various purposes like file validation, automation of ETL process and job scheduling using Crontab.
- Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
- Scheduled many jobs to automate different database related activities including backup, monitoring database health, disk space, backup verification.
- Installing multi-node production Hadoop environments with Amazon AWS and AWS technologies such as EMR, EC2, IAM, S3, Data Pipeline etc.
- Used tools for Export & Import data from different sources like SQL Server Database, Flat file, CSV, Excel and many other data sources supports ODBC, OLE DB Data Sources.
- Involved in loading data from LINUX and UNIX file system to HDFS.
- Handled data processing & incremental updates using Hive & process the data in hive table.
- Implementing, managing and administering the overall hadoop infrastructure.
- Takes care of the day-to-day running of Hadoop clusters.
- Importing and Exporting data using Sqoop from HDFS TO RDBMS and vice-versa.
- Created a local YUM repository for installing and updating packages. Configured and deployed hive metastore using MySQL and thrift server.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
- Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
Confidential
Hadoop Developer
Responsibilities:
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie and Zookeeper.
- Deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, HCATALOG, HBASE, ZOOKEEPER) using Hortonworks Ambari.
- Designed, planned and delivered a proof of concept and business function/division based implementation of a Big Data roadmap and strategy project.
- Adding/installation of new components and removal of the same through Ambari.
- Created tables, loaded data, and wrote queries in Hive.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Monitor and manage comprehensive data security across the Hadoop platform use Apache Ranger.
- Involved in exporting the analyzed data to the databases such as MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
- For building high performance batch and interactive data processing applications, coordinated by YARN in Hadoop use Apache Tez.
- Worked on an Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC connector.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client requirement.
- Developed scripts and automated data management from end to end and sync up b/w all the Clusters
- Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.
Confidential
Linux Administrator
Responsibilities:
- Installation, Configuration, Management and Maintenance over Linux System REDHAT 6.
- Manage all internet applications inclusive to DNS, RADIUS, Apache, MySQL, PHP. Taking frequent back up of data, create new storage procedures and scheduled back up is one of the duties.
- Adding & deleting Users, Groups & Others. Changing Groupers & ownership, implementing ACL’s.
- Involved in loading data from LINUX and UNIX file system to HDFS.
- Installation, Management, Configuration of LAN/WAN systems utilizing Cisco switches and routers.
- Package management using RPM & YUM. Disk Quota Management.
- Configuring and maintaining common Linux application services such as NFS, DHCP, BIND, SSH, HTTP, HTTPS,, FTP,LDAP, SMTP, MySQL and LAMP.
- Building & configuring Red Hat Linux systems over the network, implementing automated tasks through crontab, resolving tickets according to the priority basis.
- Performed Back-up management using tap drivers, manual commands (tar, scp, rsync) for remote server storage.
- Experience in writing UNIX Shell scripts for various purposes like file validation, automation of ETL process and job scheduling using Crontab.
- Modernized security policy for Red Hat servers.
- Work with Linux-friendly applications and able to troubleshoot it when issue arises from the server .