We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

0/5 (Submit Your Rating)

Detroit, MI

SUMMARY

  • Over 9 years of professional IT experience which includes in Big Data ecosystem and related technologies.
  • 7+ years of exclusive experience in Hadoop Administration and its components like - HDFS, Hive, Impala, HBase, Sqoop, Spark, MapReduce, Oozie, YARN, and Zookeeper on Linux using CDH.
  • Strong knowledge and experience in supporting Linux environments.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, and Troubleshooting.
  • Experienced in upgrading the CDH 6. X to CDP 7. X.
  • ETL Process (using talend, sqoop, distcp, spark rdd, scp, sftp).
  • Handling in setting up fully distributed multi-nodeHadoopclusters, with Apache and AWS EC2 instances.
  • Installing, configuring, and tuning ecosystem components - HDFS, Hive, Map Reduce, Oozie, YARN, and Zookeeper.
  • Experience in designing and implementing secure Hadoop clusters using Kerberos, Apache Sentry/Ranger.
  • Configure SSL and Kerberos Security on Hadoop services.
  • Enabling High Availability to HDFS, YARN, Hive, Impala, and Services to improve redundancy from failures.
  • Good experience on Design, configuring, and managing the backup and disaster recovery for Hadoop data.
  • Hands-on experience in analyzing Log files for Hadoop and ecosystem services and finding the root cause.
  • Experience in copying files within cluster or intra - cluster using the DistCp command line utility and from the Cloudera manager interface.
  • Experience in HDFS data storage and support for running map-reduce jobs.
  • Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration, and Recovery from a Name node failure.
  • Good working knowledge of importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
  • Strong knowledge of yarn terminology and the High-Availability of the Hadoop Clusters.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Excellent oral and written communication and presentation skills, analytical and problem-solving skills.
  • Effective problem-solving skills and ability to learn and use new technologies/tools quickly.
  • Available for on-call support 24/7.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, YARN, Hive, Impala, HBase, Oozie, Sqoop, Spark, Solr, Nifi, Hue, Hcatalog, AWS, Chief, Data Modeling, Flume & Zookeeper, Jira.

Languages and Technologies: Python, SQL, NoSQL.

Operating Systems: Linux & UNIX. Windows, MAC.

Databases: MySQL, Oracle, Teradata, PostgreSQL, DB2.

NoSQL Databases: HBase, Cassandra, MongoDB

Office Tools: MS Word, MS Excel, MS PowerPoint, MS Project

PROFESSIONAL EXPERIENCE

Hadoop Administrator

Confidential, Detroit, MI

Responsibilities:

  • Involve in installation, configuration, deployment, maintenance, monitoring, and troubleshooting of CDH clusters in multiple environments such as Development and Production.
  • Extensively worked with Cloudera Distribution Hadoop CDH 6. x.
  • Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning, Cluster Monitoring, and Troubleshooting of theHadoopCluster.
  • Solid experience to upgrade the Cloudera cluster from 6.3.4. to 7.1.7(CDP).
  • Implemented Load Balancing and HA for NameNode, HIVE, and IMPALA Proxy using Cloudera manager.
  • Installation of hue for GUI access for Hive, and OOZIE and also resolve the issues reported by users in Hue.
  • Responsible for managing and scheduling jobs on aHadoopCluster.
  • Administrating and optimizing theHadoopclusters, monitoring Hadoop jobs, and working with the development team to fix the issues.
  • Flume and Talend configuration for data transfer from Webservers toHadoopcluster.
  • Involved in loading data from the UNIX file system to HDFS.
  • Used Sqoop to import data from RDBMS to HDFS.
  • Set up MySQL as an External backup database for Cloudera Manager and other Hadoop components.
  • Implemented Kerberos Security Authentication protocol for all services in the Hadoop cluster.
  • Implemented TLS for CDH Services and for Cloudera Manager.
  • Work on User Management Tool for user creation, granting permission for the user to various tables and databases, and giving group permissions.
  • Working with data delivery teams to set up newHadoopusers, which includes setting up Linux users, and setting up Kerberos principals.
  • Deployed a Test Cluster leveraging the AWS cloud services such as EC2 and installed and configured all the Hadoop Ecosystem Services.
  • Choosing the right file formats inHadoopfile systems Text, Avro, Parquet, and compression techniques such as Snappy, bz2, and LZO.
  • Improved communication between teams which led to an increase in the number of simultaneous projects and average billable hours.

Environment: Hadoop, AWS, HDFS, MapReduce, Spark, Hive, Impala, Sqoop, Yarn, Kafka, HBase, Atlas, Ganglia, Solr, Nifi, Talend, Oozie, SQL scripting, Linux shell scripting, and Cloudera.

Hadoop Administrator

Confidential, Seattle, WA

Responsibilities:

  • Installation, configuration, support, and maintenance of Hadoop clusters using Apache, Hortonworks, and yarn distributions.
  • Involved in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, and Troubleshooting.
  • Installing and configuring the Hadoop ecosystem.
  • Involved in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
  • Loading log data directly into HDFS using Flume.
  • Importing and exporting data into HDFS using Sqoop.
  • Backup configuration and Recovery from a Namenode failure.
  • Namenode high availability with quorum journal manager, shared edit logs.
  • Involved in the Access Control Lists on HDFS.
  • Configuring Rack Awareness on HDP
  • Starting and stopping the Hadoop demons like Namenode, StandbyNamenode, data node, Resource Manager, and NodeManager.
  • Configured AD, Centrify, and integrated with Kerberos.
  • Involved in copying files within a cluster or intra-cluster using DistCp command line utility.
  • Implementing and managing the overall Hadoop infrastructure.
  • Ensure that the Hadoop cluster is running all the time.
  • Involved in commissioning and decommissioning slave node line data nodes, HBase region servers, NodeManager
  • Involved in the cluster capacity scheduler.

Environment: Hadoop 2.0, Map Reduce, HDFS, Hive, Impala,Zookeeper, Airflow, Hortonworks, NoSQL, Oracle, Red hat Linux.

Hadoop Administrator

Confidential, Manhattan, NY

Responsibilities:

  • Involved in the start-to-end process of Hadoop cluster setup where in installation, configuration, and monitoring of the Hadoop Cluster.
  • Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using HortonWorks.
  • Responsible for Cluster maintenance, commissioning and decommissioning of Data nodes
  • Cluster Monitoring, Troubleshooting, Managing and reviewing data backups, and Managing & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Responsible for Installation and configuration of Hive, HBase, and Sqoop on the Hadoop cluster.
  • Configured various property files like core-site.xml, hdfs-site.xml, and mapred-site.xml based on the job requirement
  • Involved in loading data from the UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, and experienced in managing and reviewing Hadoop log files.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hbase and Hive.
  • Managed and reviewed Hadoop Log files as a part of the administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Extracted meaningful data from dealer CSV files, text files, and mainframe files and generated Python panda reports for data analysis.
  • Performed data analysis, feature selection, and feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Involved in Analyzing system failures, identifying root causes, and recommending courses of action. Documented the systems processes and procedures for future reference.
  • Worked with the systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.

Environment: Horton Work, Hadoop, HDFS, Hive,Impala, Sqoop, Flume, Kafka, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL.

SQL Server DBA

Confidential, Manhattan, NY

Responsibilities:

  • Monitoring the server for High availability of the servers.
  • Solving the request raised by user support.
  • Participate in the implementation of new releases into production.
  • Facilitates application deployment and joint effort deployments.
  • Meet the SLA (Service Level Agreement) established.
  • Day-to-day administration of live SQL Servers.
  • Designed and implemented Security policy for SQL Servers.
  • Used Stored Procedures, DTS packages, and BCP for updating Servers.
  • Designed and implemented the comprehensive Backup plan and disaster recovery strategies
  • Implemented and Scheduled the Replication process for updating our parallel servers.
  • Automated messaging services on server failures, for task completion and success.
  • Execute procedures to accomplish short-term business goals.
  • Deploying Administration tools like SQL DB Access, Performance Monitor, and Backup Utility.
  • Deploying monitoring tools like SiteScope and spotlight.
  • Table partitioning to improve I/O access.
  • Enforce Database Security by Managing user privileges and efficient resource management by assigning profiles to users.
  • Using database level triggers code for database consistency, integrity, and security.
  • Tested and Implemented procedures to detect poor database performance, collect required information, and root cause analysis.

Environment: SQL Server 2000/7.0, IIS, Windows 2003/2000 Server, SQL DB Access, Performance Monitor, Backup Utility, MOM, MSE and SiteScope.

We'd love your feedback!