Hadoop Administrator Resume
White Plains N, Y
SUMMARY
- Around 7 years of professional IT career including 5 years of latest technology/tools in Bigdata/Hadoop ecosystem. Hands on experienced in Hadoop architecture, tools and projects in healthcare as well as banking domain.
- Familiar in both Cloudera CDH/CDP and Hortonworks HDP ecosystem.
- Hands on experience in multimode Hadoop cluster installation, configuration, performance tuning and management depending on various industry domain and use cases.
- Participating in designing cluster layout architecture, documenting installation steps in Wiki, cross checking dependency with respective teams and Cloudera support and setting up deployment timeline depending on expectation.
- Collaborating with DBA team to set up external metadata database using MySQL/Postgres for various Hadoop components and tools like Cloudera manager, Apache Ambari, Report manager, activity monitor, Hive, Zookeeper, Sentry, HBase etc.
- Configuring and managing HDFS, YARN, Hive - Beeline, Impala, Tez, Spark, MapReduce, Sqoop, HBase, Oozie, Zookeeper, Kerberos, Sentry, Ranger, Cloudera Manager, Ranger, Oozie, Hue etc.
- Deploying SSL/TLS for data at transit encryption, managing Keystore,Truststore, private/public/CA certificates and integrating with Cloudera Manager and other Hadoop components.
- Deploying Kerberos for service and user authentication, creating and managing user Keytabs using Ktutil.
- Involved in cluster benchmarking and performance tuning of services like YARN, Hive, Impala, Spark, Tez etc.
- Managing Resource Pool/Yarn Queue Manager for proper distribution of resources among services and users/groups.
- Deploying Namenode HA and service level HA for resource manager of Yarn, Hiverserver2 and Metastore services of Hive, Impala and Hue using Cloudera Manager and Haproxy.
- Deploying best practices for clusters and services like HDFS, Yarn, Hive, Impala, and Spark based on project use cases.
- Participating in node Commission/Decommission, HDFS Quota and DFS Facl deployment, cluster upgrading and patch deployment.
- Deploying cluster in POC environment and testing upcoming use cases for future implementation.
- Helping developers, testers and project teams to access the Cluster, resolving permission issues figuring out proper connection strings for Hive, Impala, Spark using Python and Shell commands.
- Implementing projects with different versions of Code released in GIT/bitbucket, troubleshooting implementation issues, environmental issues and reporting defects/limitations for future development.
- Working with developers, testers and project teams via meetings to discuss about project limitations, blocker and client expectations.
- Using Kafka for consuming real-time data, using spark to process the captured data and warehousing the processed data using hive.
- Scheduling jobs using Oozie, Airflow, Autosys etc and collaborating with ETL team to troubleshot job failures.
- Monitoring clusters, nodes, services and jobs health and issues using Cloudera Manager, HDFS/Linux commands and collaborating with Cloudera and respective teams to resolve/deploy fixes.
- Experience on designing, configuring and managing backup and disaster recovery using Data Replication, Snapshots and Cloudera BDR utilities.
- Implemented Role based authorization for HDFS, HIVE using Apache Sentry and Ranger.
- Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager.
- Experienced in implementing and supporting auditing tools like Cloudera Navigator.
- Participated in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
- Experience in analyzing log files, job failures, configuration issues, identifying root causes and taking/recommending course of actions.
- Additional responsibilities include interacting with offshore team daily, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
- Good Experience in managing Linux environments and deploying best practices for functional Hadoop ecosystem.
- Knowledgeable in cloud environments like AWS, Azure and services like S3, EC2, VPN, HDInsight, Nifi etc.
- Effective problem-solving skills and ability to learn and use new technologies/tools quickly.
- Experienced inData Ingestionprojects to inject data intoData Lakeusing multiple sources systems usingTalend and Open Studio.
- Workable scripting knowledge in Bash shell scripting.
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential, White Plains, N.Y
Responsibilities:
- Installed multinode Hadoop clusters in development environment, participated in cluster deployment in production environment with the help of Cloudera team.
- Deployed cluster in POC environment using new Cloudera CDP version.
- Implemented use cases in CDP for identifying the best solutions / Proof of Concept leveraging Big Data & Advanced Analytics levers that meet and exceed the customer's business, functional and technical requirements.
- Enabled Name-node HA using quorum journal manager (QJM) by setting up standby Namenode, journal node and ZKFC.
- Deployed HA for resource manager, hiveserver2, hive Metastore, Impala, Hue using Cloudera manager and Haproxy in development environment.
- Ran spark jobs to process raw data coming from multiple data sources and used Hive to warehouse the data in databases/tables for future use.
- Responsible for Cluster maintenance, Commissioning and Decommissioning Data Nodes, Cluster Monitoring, Troubleshooting, manage and review data backups, Manage & review Hadoop log files.
- Supported teams to troubleshoot Map Reduce jobs running on the cluster.
- Day to day responsibilities includes solving access issues, deployments, implementing projects/codes to various environments, providing access to the new user, providing instant solutions for reducing the impact and documenting the same and preventing future issues.
- Involved in loading data from Linux file system to HDFS/Hive.
- Involved in creating Hive tables, loading data, and writing hive queries depends on the requirements.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Store unstructured data as key value pair in HDFS using HBase.
- Design and Configure the Cluster with the services required (Sentry, Hive server2, Kerberos, HDFS, Hue, Hive, Zookeeper.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Performed change request and Incident management process following the company standards.
- Implemented partitioning, bucketing, indexing, and analyzing in HIVE.
- Configured MySQL as external metadata store for different Hadoop components and tools.
- Continuous monitoring and managing theHadoopcluster using Cloudera Manager and Linux command line.
- Demonstration of the Live Proof of Concept Demo to Clients.
- Supported different teams on call and on provisioned demand.
Environment: Cloudera Manager CDH 5.x, CDP,HDFS, YARN, MapReduce, Hive, Spark, Impala, Zookeeper, Oozie, HBase, Sqoop, Python, MySQL, RedHat Linux, Unix, Talend, AWS.
Hadoop/Big Data Consultant
Confidential, New York City, N.Y
Responsibilities:
- Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, YARN, Zookeeper Hive, Impala, Sqoop, Sentry, Kerberos and Spark using Cloudera Manager.
- Involved in Minor and Major Release work activities. Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Created Pools and allocated cores, Memory and usage depending up on the priority.
- Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
- Worked with Compute team in administering the server Hardware and operating system.
- Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
- Conducted Root Cause Analysis and resolved production problems and data issues.
- Monitored and provided support for development and production clusters.
- Performed minor and major upgrades of Cloudera distributed Hadoop.
- Perform Operational tasks such as creating databases, tables in hive and moving data from one environment to another.
- Installed Kerberos and creating key tab and principal for individual users.
- Enabled High availability of name node and Job tracker services.
- Monitoring of resources like memory, CPU utilization via charts on Cloudera Manager and Nagios.
- Experience configuring fair scheduler and capacity scheduler to provide SLAs in a multiple-tenant environment.
- Setting up queues and quotas for users in Cloudera Distributed Hadoop.
- Routine administration tasks include Commissioning new nodes, Decommissioning of old hardware, migration of data, disk failures, job failures, etc.
- Implemented Apache Sentry on HDFS.
- Involved in creating Hive tables, loading with data in Hive.
Environment: Cloudera CDH, Hive, HDFS, MapReduce, YARN, Impala, Spark, Hive, Zookeeper, Oozie, Shell Scripting, Sentry, Nagios, Linux/Unix, MySQL.
Hadoop Consultant
Confidential, St. Louis, MO
Responsibilities:
- Configured and managed multimode Hadoop clusters based on Hortonworks HDP. Monitoring service health using Apache Ambari and maintaining service availability by troubleshooting service related issues.
- Created Ambari Views for Tez, Hive and HDFS.
- Involved in loading and transforming large sets of structured, semi structured, and unstructured data from relational databases into HDFS using Sqoop imports.
- Involved in MapReduce Converged Data Platform, built with the idea of data movement in mind.
- Involved in exporting the analyzed data to the databases such as MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
- Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive that extract the data in a timely manner.
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC/JDBC connector.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Transformed the data using Sqoop, Talend for BI team to perform visual analytics, according to the client requirement.
- Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.
Environment: Hortonworks HDP 2.x and 3.x, HDFS, MapReduce, Tez, Sentry, Hive, Oozie, HBase, Shell Scripting, Oracle, Talend.
Linux Administrator
Confidential, New York City, NY
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Created Virtual server and installed operating system on Guest Servers.
- Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Monitoring the System activity, Performance, Resource utilization.
- Configuring NFS, DNS.
- Updating YUM Repository and Red hat Package Manager (RPM).
- Created RPM packages using RPMBUILD, verifying the new build packages and distributing the package.
- Configuring distributed file systems and administering NFS server and NFS clients and editing auto-mounting mapping as per system / user requirements.
- Running Crontab to back up data and troubleshooting Hardware/OS issues.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recoverability by implementing system and application level backups.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
- Performed scheduled backup and necessary restoration.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
Environment: RedHat Linux/UNIX, MYSQL, SQL, TCP/IP, DNS, Shell, Networking.
