Hadoop Administrator Resume White Plains, N.Y - Hire IT People

SUMMARY

Around 7 years of professional IT career including 5 years of latest technology/tools in Bigdata/Hadoop ecosystem. Hands on experienced in Hadoop architecture, tools and projects in healthcare as well as banking domain.
Familiar in both Cloudera CDH/CDP and Hortonworks HDP ecosystem.
Hands on experience in multimode Hadoop cluster installation, configuration, performance tuning and management depending on various industry domain and use cases.
Participating in designing cluster layout architecture, documenting installation steps in Wiki, cross checking dependency with respective teams and Cloudera support and setting up deployment timeline depending on expectation.
Collaborating with DBA team to set up external metadata database using MySQL/Postgres for various Hadoop components and tools like Cloudera manager, Apache Ambari, Report manager, activity monitor, Hive, Zookeeper, Sentry, HBase etc.
Configuring and managing HDFS, YARN, Hive - Beeline, Impala, Tez, Spark, MapReduce, Sqoop, HBase, Oozie, Zookeeper, Kerberos, Sentry, Ranger, Cloudera Manager, Ranger, Oozie, Hue etc.
Deploying SSL/TLS for data at transit encryption, managing Keystore,Truststore, private/public/CA certificates and integrating with Cloudera Manager and other Hadoop components.
Deploying Kerberos for service and user authentication, creating and managing user Keytabs using Ktutil.
Involved in cluster benchmarking and performance tuning of services like YARN, Hive, Impala, Spark, Tez etc.
Managing Resource Pool/Yarn Queue Manager for proper distribution of resources among services and users/groups.
Deploying Namenode HA and service level HA for resource manager of Yarn, Hiverserver2 and Metastore services of Hive, Impala and Hue using Cloudera Manager and Haproxy.
Deploying best practices for clusters and services like HDFS, Yarn, Hive, Impala, and Spark based on project use cases.
Participating in node Commission/Decommission, HDFS Quota and DFS Facl deployment, cluster upgrading and patch deployment.
Deploying cluster in POC environment and testing upcoming use cases for future implementation.
Helping developers, testers and project teams to access the Cluster, resolving permission issues figuring out proper connection strings for Hive, Impala, Spark using Python and Shell commands.
Implementing projects with different versions of Code released in GIT/bitbucket, troubleshooting implementation issues, environmental issues and reporting defects/limitations for future development.
Working with developers, testers and project teams via meetings to discuss about project limitations, blocker and client expectations.
Using Kafka for consuming real-time data, using spark to process the captured data and warehousing the processed data using hive.
Scheduling jobs using Oozie, Airflow, Autosys etc and collaborating with ETL team to troubleshot job failures.
Monitoring clusters, nodes, services and jobs health and issues using Cloudera Manager, HDFS/Linux commands and collaborating with Cloudera and respective teams to resolve/deploy fixes.
Experience on designing, configuring and managing backup and disaster recovery using Data Replication, Snapshots and Cloudera BDR utilities.
Implemented Role based authorization for HDFS, HIVE using Apache Sentry and Ranger.
Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager.
Experienced in implementing and supporting auditing tools like Cloudera Navigator.
Participated in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
Experience in analyzing log files, job failures, configuration issues, identifying root causes and taking/recommending course of actions.
Additional responsibilities include interacting with offshore team daily, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
Good Experience in managing Linux environments and deploying best practices for functional Hadoop ecosystem.
Knowledgeable in cloud environments like AWS, Azure and services like S3, EC2, VPN, HDInsight, Nifi etc.
Effective problem-solving skills and ability to learn and use new technologies/tools quickly.
Experienced inData Ingestionprojects to inject data intoData Lakeusing multiple sources systems usingTalend and Open Studio.
Workable scripting knowledge in Bash shell scripting.

PROFESSIONAL EXPERIENCE

Hadoop Administrator

Confidential, White Plains, N.Y

Responsibilities:

Installed multinode Hadoop clusters in development environment, participated in cluster deployment in production environment with the help of Cloudera team.
Deployed cluster in POC environment using new Cloudera CDP version.
Implemented use cases in CDP for identifying the best solutions / Proof of Concept leveraging Big Data & Advanced Analytics levers that meet and exceed the customer's business, functional and technical requirements.
Enabled Name-node HA using quorum journal manager (QJM) by setting up standby Namenode, journal node and ZKFC.
Deployed HA for resource manager, hiveserver2, hive Metastore, Impala, Hue using Cloudera manager and Haproxy in development environment.
Ran spark jobs to process raw data coming from multiple data sources and used Hive to warehouse the data in databases/tables for future use.
Responsible for Cluster maintenance, Commissioning and Decommissioning Data Nodes, Cluster Monitoring, Troubleshooting, manage and review data backups, Manage & review Hadoop log files.
Supported teams to troubleshoot Map Reduce jobs running on the cluster.
Day to day responsibilities includes solving access issues, deployments, implementing projects/codes to various environments, providing access to the new user, providing instant solutions for reducing the impact and documenting the same and preventing future issues.
Involved in loading data from Linux file system to HDFS/Hive.
Involved in creating Hive tables, loading data, and writing hive queries depends on the requirements.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
Store unstructured data as key value pair in HDFS using HBase.
Design and Configure the Cluster with the services required (Sentry, Hive server2, Kerberos, HDFS, Hue, Hive, Zookeeper.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Performed change request and Incident management process following the company standards.
Implemented partitioning, bucketing, indexing, and analyzing in HIVE.
Configured MySQL as external metadata store for different Hadoop components and tools.
Continuous monitoring and managing theHadoopcluster using Cloudera Manager and Linux command line.
Demonstration of the Live Proof of Concept Demo to Clients.
Supported different teams on call and on provisioned demand.

Environment: Cloudera Manager CDH 5.x, CDP,HDFS, YARN, MapReduce, Hive, Spark, Impala, Zookeeper, Oozie, HBase, Sqoop, Python, MySQL, RedHat Linux, Unix, Talend, AWS.

Hadoop/Big Data Consultant

Confidential, New York City, N.Y

Responsibilities:

Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, YARN, Zookeeper Hive, Impala, Sqoop, Sentry, Kerberos and Spark using Cloudera Manager.
Involved in Minor and Major Release work activities. Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams.
Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
Created Pools and allocated cores, Memory and usage depending up on the priority.
Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
Worked with Compute team in administering the server Hardware and operating system.
Proactively involved in ongoing Maintenance, Support and Improvements in Hadoop clusters.
Conducted Root Cause Analysis and resolved production problems and data issues.
Monitored and provided support for development and production clusters.
Performed minor and major upgrades of Cloudera distributed Hadoop.
Perform Operational tasks such as creating databases, tables in hive and moving data from one environment to another.
Installed Kerberos and creating key tab and principal for individual users.
Enabled High availability of name node and Job tracker services.
Monitoring of resources like memory, CPU utilization via charts on Cloudera Manager and Nagios.
Experience configuring fair scheduler and capacity scheduler to provide SLAs in a multiple-tenant environment.
Setting up queues and quotas for users in Cloudera Distributed Hadoop.
Routine administration tasks include Commissioning new nodes, Decommissioning of old hardware, migration of data, disk failures, job failures, etc.
Implemented Apache Sentry on HDFS.
Involved in creating Hive tables, loading with data in Hive.

Environment: Cloudera CDH, Hive, HDFS, MapReduce, YARN, Impala, Spark, Hive, Zookeeper, Oozie, Shell Scripting, Sentry, Nagios, Linux/Unix, MySQL.

Hadoop Consultant

Confidential, St. Louis, MO

Responsibilities:

Configured and managed multimode Hadoop clusters based on Hortonworks HDP. Monitoring service health using Apache Ambari and maintaining service availability by troubleshooting service related issues.
Created Ambari Views for Tez, Hive and HDFS.
Involved in loading and transforming large sets of structured, semi structured, and unstructured data from relational databases into HDFS using Sqoop imports.
Involved in MapReduce Converged Data Platform, built with the idea of data movement in mind.
Involved in exporting the analyzed data to the databases such as MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive that extract the data in a timely manner.
Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC/JDBC connector.
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
Transformed the data using Sqoop, Talend for BI team to perform visual analytics, according to the client requirement.
Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.

Environment: Hortonworks HDP 2.x and 3.x, HDFS, MapReduce, Tez, Sentry, Hive, Oozie, HBase, Shell Scripting, Oracle, Talend.

Linux Administrator

Confidential, New York City, NY

Responsibilities:

Installation and configuration of Linux for new build environment.
Created Virtual server and installed operating system on Guest Servers.
Installed Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Monitoring the System activity, Performance, Resource utilization.
Configuring NFS, DNS.
Updating YUM Repository and Red hat Package Manager (RPM).
Created RPM packages using RPMBUILD, verifying the new build packages and distributing the package.
Configuring distributed file systems and administering NFS server and NFS clients and editing auto-mounting mapping as per system / user requirements.
Running Crontab to back up data and troubleshooting Hardware/OS issues.
Created volume groups logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Experience with Linux internals, virtual machines, and open source tools/platforms.
Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
Ensured data recoverability by implementing system and application level backups.
Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
Performed scheduled backup and necessary restoration.
Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.

Environment: RedHat Linux/UNIX, MYSQL, SQL, TCP/IP, DNS, Shell, Networking.

We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

White Plains N, Y

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship