We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

2.00/5 (Submit Your Rating)

White Plains N, Y

SUMMARY

  • Around 7 years of professional IT career including 5 years of latest technology/tools in Bigdata/Hadoop ecosystem. Hands on experienced in Hadoop architecture, tools and projects in healthcare as well as banking domain.
  • Familiar in both Cloudera CDH/CDP and Hortonworks HDP ecosystem.
  • Hands on experience in multimode Hadoop cluster installation, configuration, performance tuning and management depending on various industry domain and use cases.
  • Participating in designing cluster layout architecture, documenting installation steps in Wiki, cross checking dependency with respective teams and Cloudera support and setting up deployment timeline depending on expectation.
  • Collaborating with DBA team to set up external metadata database using MySQL/Postgres for various Hadoop components and tools like Cloudera manager, Apache Ambari, Report manager, activity monitor, Hive, Zookeeper, Sentry, HBase etc.
  • Configuring and managing HDFS, YARN, Hive - Beeline, Impala, Tez, Spark, MapReduce, Sqoop, HBase, Oozie, Zookeeper, Kerberos, Sentry, Ranger, Cloudera Manager, Ranger, Oozie, Hue etc.
  • Deploying SSL/TLS for data at transit encryption, managing Keystore,Truststore, private/public/CA certificates and integrating with Cloudera Manager and other Hadoop components.
  • Deploying Kerberos for service and user authentication, creating and managing user Keytabs using Ktutil.
  • Involved in cluster benchmarking and performance tuning of services like YARN, Hive, Impala, Spark, Tez etc.
  • Managing Resource Pool/Yarn Queue Manager for proper distribution of resources among services and users/groups.
  • Deploying Namenode HA and service level HA for resource manager of Yarn, Hiverserver2 and Metastore services of Hive, Impala and Hue using Cloudera Manager and Haproxy.
  • Deploying best practices for clusters and services like HDFS, Yarn, Hive, Impala, and Spark based on project use cases.
  • Participating in node Commission/Decommission, HDFS Quota and DFS Facl deployment, cluster upgrading and patch deployment.
  • Deploying cluster in POC environment and testing upcoming use cases for future implementation.
  • Helping developers, testers and project teams to access the Cluster, resolving permission issues figuring out proper connection strings for Hive, Impala, Spark using Python and Shell commands.
  • Implementing projects with different versions of Code released in GIT/bitbucket, troubleshooting implementation issues, environmental issues and reporting defects/limitations for future development.
  • Working with developers, testers and project teams via meetings to discuss about project limitations, blocker and client expectations.
  • Using Kafka for consuming real-time data, using spark to process the captured data and warehousing the processed data using hive.
  • Scheduling jobs using Oozie, Airflow, Autosys etc and collaborating with ETL team to troubleshot job failures.
  • Monitoring clusters, nodes, services and jobs health and issues using Cloudera Manager, HDFS/Linux commands and collaborating with Cloudera and respective teams to resolve/deploy fixes.
  • Experience on designing, configuring and managing backup and disaster recovery using Data Replication, Snapshots and Cloudera BDR utilities.
  • Implemented Role based authorization for HDFS, HIVE using Apache Sentry and Ranger.
  • Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager.
  • Experienced in implementing and supporting auditing tools like Cloudera Navigator.
  • Participated in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
  • Experience in analyzing log files, job failures, configuration issues, identifying root causes and taking/recommending course of actions.
  • Additional responsibilities include interacting with offshore team daily, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
  • Good Experience in managing Linux environments and deploying best practices for functional Hadoop ecosystem.
  • Knowledgeable in cloud environments like AWS, Azure and services like S3, EC2, VPN, HDInsight, Nifi etc.
  • Effective problem-solving skills and ability to learn and use new technologies/tools quickly.
  • Experienced inData Ingestionprojects to inject data intoData Lakeusing multiple sources systems usingTalend and Open Studio.
  • Workable scripting knowledge in Bash shell scripting.

PROFESSIONAL EXPERIENCE

Hadoop Administrator

Confidential, White Plains, N.Y

Responsibilities:

  • Installed multinode Hadoop clusters in development environment, participated in cluster deployment in production environment with the help of Cloudera team.
  • Deployed cluster in POC environment using new Cloudera CDP version.
  • Implemented use cases in CDP for identifying the best solutions / Proof of Concept leveraging Big Data & Advanced Analytics levers that meet and exceed the customer's business, functional and technical requirements.
  • Enabled Name-node HA using quorum journal manager (QJM) by setting up standby Namenode, journal node and ZKFC.
  • Deployed HA for resource manager, hiveserver2, hive Metastore, Impala, Hue using Cloudera manager and Haproxy in development environment.
  • Ran spark jobs to process raw data coming from multiple data sources and used Hive to warehouse the data in databases/tables for future use.
  • Responsible for Cluster maintenance, Commissioning and Decommissioning Data Nodes, Cluster Monitoring, Troubleshooting, manage and review data backups, Manage & review Hadoop log files.
  • Supported teams to troubleshoot Map Reduce jobs running on the cluster.
  • Day to day responsibilities includes solving access issues, deployments, implementing projects/codes to various environments, providing access to the new user, providing instant solutions for reducing the impact and documenting the same and preventing future issues.
  • Involved in loading data from Linux file system to HDFS/Hive.
  • Involved in creating Hive tables, loading data, and writing hive queries depends on the requirements.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Store unstructured data as key value pair in HDFS using HBase.
  • Design and Configure the Cluster with the services required (Sentry, Hive server2, Kerberos, HDFS, Hue, Hive, Zookeeper.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Performed change request and Incident management process following the company standards.
  • Implemented partitioning, bucketing, indexing, and analyzing in HIVE.
  • Configured MySQL as external metadata store for different Hadoop components and tools.
  • Continuous monitoring and managing theHadoopcluster using Cloudera Manager and Linux command line.
  • Demonstration of the Live Proof of Concept Demo to Clients.
  • Supported different teams on call and on provisioned demand.

Environment: Cloudera Manager CDH 5.x, CDP,HDFS, YARN, MapReduce, Hive, Spark, Impala, Zookeeper, Oozie, HBase, Sqoop, Python, MySQL, RedHat Linux, Unix, Talend, AWS.

Hadoop/Big Data Consultant

Confidential, New York City, N.Y

Responsibilities:

  • Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, YARN, Zookeeper Hive, Impala, Sqoop, Sentry, Kerberos and Spark using Cloudera Manager.
  • Involved in Minor and Major Release work activities. Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
  • Created Pools and allocated cores, Memory and usage depending up on the priority.
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
  • Worked with Compute team in administering the server Hardware and operating system.
  • Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
  • Conducted Root Cause Analysis and resolved production problems and data issues.
  • Monitored and provided support for development and production clusters.
  • Performed minor and major upgrades of Cloudera distributed Hadoop.
  • Perform Operational tasks such as creating databases, tables in hive and moving data from one environment to another.
  • Installed Kerberos and creating key tab and principal for individual users.
  • Enabled High availability of name node and Job tracker services.
  • Monitoring of resources like memory, CPU utilization via charts on Cloudera Manager and Nagios.
  • Experience configuring fair scheduler and capacity scheduler to provide SLAs in a multiple-tenant environment.
  • Setting up queues and quotas for users in Cloudera Distributed Hadoop.
  • Routine administration tasks include Commissioning new nodes, Decommissioning of old hardware, migration of data, disk failures, job failures, etc.
  • Implemented Apache Sentry on HDFS.
  • Involved in creating Hive tables, loading with data in Hive.

Environment: Cloudera CDH, Hive, HDFS, MapReduce, YARN, Impala, Spark, Hive, Zookeeper, Oozie, Shell Scripting, Sentry, Nagios, Linux/Unix, MySQL.

Hadoop Consultant

Confidential, St. Louis, MO

Responsibilities:

  • Configured and managed multimode Hadoop clusters based on Hortonworks HDP. Monitoring service health using Apache Ambari and maintaining service availability by troubleshooting service related issues.
  • Created Ambari Views for Tez, Hive and HDFS.
  • Involved in loading and transforming large sets of structured, semi structured, and unstructured data from relational databases into HDFS using Sqoop imports.
  • Involved in MapReduce Converged Data Platform, built with the idea of data movement in mind.
  • Involved in exporting the analyzed data to the databases such as MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
  • Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive that extract the data in a timely manner.
  • Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC/JDBC connector.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Transformed the data using Sqoop, Talend for BI team to perform visual analytics, according to the client requirement.
  • Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.

Environment: Hortonworks HDP 2.x and 3.x, HDFS, MapReduce, Tez, Sentry, Hive, Oozie, HBase, Shell Scripting, Oracle, Talend.

Linux Administrator

Confidential, New York City, NY

Responsibilities:

  • Installation and configuration of Linux for new build environment.
  • Created Virtual server and installed operating system on Guest Servers.
  • Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitoring the System activity, Performance, Resource utilization.
  • Configuring NFS, DNS.
  • Updating YUM Repository and Red hat Package Manager (RPM).
  • Created RPM packages using RPMBUILD, verifying the new build packages and distributing the package.
  • Configuring distributed file systems and administering NFS server and NFS clients and editing auto-mounting mapping as per system / user requirements.
  • Running Crontab to back up data and troubleshooting Hardware/OS issues.
  • Created volume groups logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Experience with Linux internals, virtual machines, and open source tools/platforms.
  • Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Ensured data recoverability by implementing system and application level backups.
  • Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
  • Performed scheduled backup and necessary restoration.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
  • Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.

Environment: RedHat Linux/UNIX, MYSQL, SQL, TCP/IP, DNS, Shell, Networking.

We'd love your feedback!