Hadoop Administrator Resume
Detroit, MI
SUMMARY
- Over 9 years of professional IT experience which includes in Big Data ecosystem and related technologies.
- 7+ years of exclusive experience in Hadoop Administration and its components like - HDFS, Hive, Impala, HBase, Sqoop, Spark, MapReduce, Oozie, YARN, and Zookeeper on Linux using CDH.
- Strong knowledge and experience in supporting Linux environments.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, and Troubleshooting.
- Experienced in upgrading the CDH 6. X to CDP 7. X.
- ETL Process (using talend, sqoop, distcp, spark rdd, scp, sftp).
- Handling in setting up fully distributed multi-nodeHadoopclusters, with Apache and AWS EC2 instances.
- Installing, configuring, and tuning ecosystem components - HDFS, Hive, Map Reduce, Oozie, YARN, and Zookeeper.
- Experience in designing and implementing secure Hadoop clusters using Kerberos, Apache Sentry/Ranger.
- Configure SSL and Kerberos Security on Hadoop services.
- Enabling High Availability to HDFS, YARN, Hive, Impala, and Services to improve redundancy from failures.
- Good experience on Design, configuring, and managing the backup and disaster recovery for Hadoop data.
- Hands-on experience in analyzing Log files for Hadoop and ecosystem services and finding the root cause.
- Experience in copying files within cluster or intra - cluster using the DistCp command line utility and from the Cloudera manager interface.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration, and Recovery from a Name node failure.
- Good working knowledge of importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
- Strong knowledge of yarn terminology and the High-Availability of the Hadoop Clusters.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Excellent oral and written communication and presentation skills, analytical and problem-solving skills.
- Effective problem-solving skills and ability to learn and use new technologies/tools quickly.
- Available for on-call support 24/7.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, Map Reduce, YARN, Hive, Impala, HBase, Oozie, Sqoop, Spark, Solr, Nifi, Hue, Hcatalog, AWS, Chief, Data Modeling, Flume & Zookeeper, Jira.
Languages and Technologies: Python, SQL, NoSQL.
Operating Systems: Linux & UNIX. Windows, MAC.
Databases: MySQL, Oracle, Teradata, PostgreSQL, DB2.
NoSQL Databases: HBase, Cassandra, MongoDB
Office Tools: MS Word, MS Excel, MS PowerPoint, MS Project
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential, Detroit, MI
Responsibilities:
- Involve in installation, configuration, deployment, maintenance, monitoring, and troubleshooting of CDH clusters in multiple environments such as Development and Production.
- Extensively worked with Cloudera Distribution Hadoop CDH 6. x.
- Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning, Cluster Monitoring, and Troubleshooting of theHadoopCluster.
- Solid experience to upgrade the Cloudera cluster from 6.3.4. to 7.1.7(CDP).
- Implemented Load Balancing and HA for NameNode, HIVE, and IMPALA Proxy using Cloudera manager.
- Installation of hue for GUI access for Hive, and OOZIE and also resolve the issues reported by users in Hue.
- Responsible for managing and scheduling jobs on aHadoopCluster.
- Administrating and optimizing theHadoopclusters, monitoring Hadoop jobs, and working with the development team to fix the issues.
- Flume and Talend configuration for data transfer from Webservers toHadoopcluster.
- Involved in loading data from the UNIX file system to HDFS.
- Used Sqoop to import data from RDBMS to HDFS.
- Set up MySQL as an External backup database for Cloudera Manager and other Hadoop components.
- Implemented Kerberos Security Authentication protocol for all services in the Hadoop cluster.
- Implemented TLS for CDH Services and for Cloudera Manager.
- Work on User Management Tool for user creation, granting permission for the user to various tables and databases, and giving group permissions.
- Working with data delivery teams to set up newHadoopusers, which includes setting up Linux users, and setting up Kerberos principals.
- Deployed a Test Cluster leveraging the AWS cloud services such as EC2 and installed and configured all the Hadoop Ecosystem Services.
- Choosing the right file formats inHadoopfile systems Text, Avro, Parquet, and compression techniques such as Snappy, bz2, and LZO.
- Improved communication between teams which led to an increase in the number of simultaneous projects and average billable hours.
Environment: Hadoop, AWS, HDFS, MapReduce, Spark, Hive, Impala, Sqoop, Yarn, Kafka, HBase, Atlas, Ganglia, Solr, Nifi, Talend, Oozie, SQL scripting, Linux shell scripting, and Cloudera.
Hadoop Administrator
Confidential, Seattle, WA
Responsibilities:
- Installation, configuration, support, and maintenance of Hadoop clusters using Apache, Hortonworks, and yarn distributions.
- Involved in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, and Troubleshooting.
- Installing and configuring the Hadoop ecosystem.
- Involved in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
- Loading log data directly into HDFS using Flume.
- Importing and exporting data into HDFS using Sqoop.
- Backup configuration and Recovery from a Namenode failure.
- Namenode high availability with quorum journal manager, shared edit logs.
- Involved in the Access Control Lists on HDFS.
- Configuring Rack Awareness on HDP
- Starting and stopping the Hadoop demons like Namenode, StandbyNamenode, data node, Resource Manager, and NodeManager.
- Configured AD, Centrify, and integrated with Kerberos.
- Involved in copying files within a cluster or intra-cluster using DistCp command line utility.
- Implementing and managing the overall Hadoop infrastructure.
- Ensure that the Hadoop cluster is running all the time.
- Involved in commissioning and decommissioning slave node line data nodes, HBase region servers, NodeManager
- Involved in the cluster capacity scheduler.
Environment: Hadoop 2.0, Map Reduce, HDFS, Hive, Impala,Zookeeper, Airflow, Hortonworks, NoSQL, Oracle, Red hat Linux.
Hadoop Administrator
Confidential, Manhattan, NY
Responsibilities:
- Involved in the start-to-end process of Hadoop cluster setup where in installation, configuration, and monitoring of the Hadoop Cluster.
- Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using HortonWorks.
- Responsible for Cluster maintenance, commissioning and decommissioning of Data nodes
- Cluster Monitoring, Troubleshooting, Managing and reviewing data backups, and Managing & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, HBase, and Sqoop on the Hadoop cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, and mapred-site.xml based on the job requirement
- Involved in loading data from the UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, and experienced in managing and reviewing Hadoop log files.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hbase and Hive.
- Managed and reviewed Hadoop Log files as a part of the administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Extracted meaningful data from dealer CSV files, text files, and mainframe files and generated Python panda reports for data analysis.
- Performed data analysis, feature selection, and feature extraction using Apache Spark Machine Learning streaming libraries in Python.
- Involved in Analyzing system failures, identifying root causes, and recommending courses of action. Documented the systems processes and procedures for future reference.
- Worked with the systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Environment: Horton Work, Hadoop, HDFS, Hive,Impala, Sqoop, Flume, Kafka, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL.
SQL Server DBA
Confidential, Manhattan, NY
Responsibilities:
- Monitoring the server for High availability of the servers.
- Solving the request raised by user support.
- Participate in the implementation of new releases into production.
- Facilitates application deployment and joint effort deployments.
- Meet the SLA (Service Level Agreement) established.
- Day-to-day administration of live SQL Servers.
- Designed and implemented Security policy for SQL Servers.
- Used Stored Procedures, DTS packages, and BCP for updating Servers.
- Designed and implemented the comprehensive Backup plan and disaster recovery strategies
- Implemented and Scheduled the Replication process for updating our parallel servers.
- Automated messaging services on server failures, for task completion and success.
- Execute procedures to accomplish short-term business goals.
- Deploying Administration tools like SQL DB Access, Performance Monitor, and Backup Utility.
- Deploying monitoring tools like SiteScope and spotlight.
- Table partitioning to improve I/O access.
- Enforce Database Security by Managing user privileges and efficient resource management by assigning profiles to users.
- Using database level triggers code for database consistency, integrity, and security.
- Tested and Implemented procedures to detect poor database performance, collect required information, and root cause analysis.
Environment: SQL Server 2000/7.0, IIS, Windows 2003/2000 Server, SQL DB Access, Performance Monitor, Backup Utility, MOM, MSE and SiteScope.
