We provide IT Staff Augmentation Services!

Hadoop Admin Resume

5.00/5 (Submit Your Rating)

New, YorK

SUMMARY

  • Over 8 years of exclusive experience as Hadoop Admin & Linux Administrator with Good working knowledge in Big data and Hadoop ecosystem related technologies.
  • Excellent Experience in Hadoop architecture and various components such as Name Node, Data Node, MapReduce, YARN (NODE MANAGER, RESOURCE MANAGER, APPLICATION MASTER), and tools including Hive for data analysis, Sqoop for data migration, Oozie for scheduling and Zookeeper for coordinating cluster resources.
  • Implemented Proof of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e., Oracle, MySQL) to Hadoop.
  • Involved in the Design and Development of technical specifications using Hadoop Echo System. Administration, Testing, Change Control Process, Hadoop administration activities such as installation, configuration, and maintenance of clusters.
  • Hands - on experience in designing and implementing solutions using Hadoop, HDFS, YARN, Hive, Impala, Oozie, Sqoop and Zookeeper.
  • Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, HCATALOG, ZOOKEEPER) usingHortonworks Ambari.
  • Theoretical knowledge ofCDP(Data Center, Data Hub)
  • Experience with Hadoop security tools Kerberos, Ranger on HDP 2.x stack and CDH 5.x.
  • Expertise in setting, configuring & monitoring of Hadoop cluster using Cloudera CDH5, Apache Hadoop on RedHat & Windows.
  • Experienced inData Ingestionprojects to inject data intoa Data lakeusing multiple source systems usingTalend Bigdata and Open Studio.
  • Hands on experience on Installing multi-node production Hadoop environments with Amazon AWS and AWS technologies such as EMR, EC2, IAM, S3, Data Pipeline etc.
  • Experience in Importing and exporting data from different databases into HDFS and Hive using Sqoop.
  • Experience in analyzing Log files and finding the root cause and then involved in analyzing failures, identifying root causes, and taking/recommending a course of action.
  • Strong knowledge in configuring High Availability forName Node, Hive, and Resource Manager.
  • Good experience in UNIX/LINUX Administrator along with SQL development in designing and implementing Relational Database model as per business needs in different domains.
  • Expertise in Commissioning, decommissioning, Balancing and Managing Nodes and tuning servers for optimal performance of the cluster.
  • Experience in Hadoop Cluster capacity planning, performance tuning, cluster monitoring, and Troubleshooting.
  • Expertise in benchmarking, performing backup and disaster recovery of Name Node metadata and important and sensitive data residing on the cluster.
  • Knowledge of Cloudera architecture and components is a must, including Director, Navigator, Manager within both cloud and on-premise environments, including securitizing services and users with Kerberos, LDAP/Active Directory.
  • Rack awareness configuration for quick availability and processing of data. Experience in designing and implementing secure Hadoop clusters using Kerberos.
  • Experience in copying files within cluster or intra - cluster using DistCp command line utility.
  • Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Hands on experience in analyzing Log files for Hadoop and ecosystem services and finding the root cause.
  • Hands on experience in Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
  • Experience in setting up SSH, SCP, and SFTP connectivity between UNIX hosts.
  • Having Experience in developingOozie workflowsand Job Controllers for job automation - Hive automation and scheduling jobs inHUEBrowser.
  • Experience in working on ITIL tools JIRA, SUCCEED and SERVICE-NOW tools for change management and support processes.
  • Additional responsibilities include interacting with the offshore team on a daily basis, communicating the requirement and delegating the tasks to offshore/on-site team members, and reviewing their delivery.

TECHNICAL SKILLS

Hadoop / Big Data Ecosystems: HDFS, YARN, MapReduce, Hive, Sqoop, Impala, Hue, Oozie, Zookeeper, Apache Tez, Spark. (Hbase, Kafka - Theoretical knowledge).

Operating Systems: Linux (Redhat), Windows.

Databases: Oracle, SQL Server, MySql.

Programming Language: SQL.

Cloud Computing Services: VMware, AWS.

Security: Kerberos, Active Directory, Sentry, Apache Ranger, TSL/SSL.

Data pipeline: Talend ETL tool and AWS Data Pipeline.

PROFESSIONAL EXPERIENCE

Confidential, New York

Hadoop Admin

Responsibilities:

  • As part of the core Hadoop team where I am working on Hadoop infrastructure and playing a key role in supporting Hadoop cluster.
  • Utilizing components such as Yarn, Zookeeper, Journal Nodes, Scoop, Hive, Impala, Hue, Hbase.
  • Worked with the systems engineering team to plan and deployed new Hadoop environments and expand existing Hadoop clusters.
  • Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons, AD integration (LDAP) and Sentry authorization.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Configured various property files like core-site.xml, hdfs-site.xml, and mapred-site.xml based upon the job requirement.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Hbase and Hive.
  • Ability to work with data consistency, data integrity and experience with real-time transactional data. Strong collaborator and team player with an Agile hand on experience on Impala.
  • Upgraded Cloudera CM and CDH stack from 5.13.0 to the latest version of 5.14.4.
  • Design and Configure the Cluster with the services required (Sentry, Hive server2, Kerberos, HDFS, Hue, Hive, Zookeeper).
  • Sentry configuration for appropriate user permissions accessing Hive server2/beeline.
  • Design and maintain the Name node and Data nodes with appropriate processing capacity and disk space.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Cloudera Distribution.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • For Design, build, testing, data integration & data management, we do use Talend ETL Tool. Talend specializes in big data integration.
  • Experience in Importing and Exporting data using Sqoop from HDFS TO RDBMS and vice-versa.
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect on running nodes and data.
  • Configured Resource management in Hadoop through dynamic resource allocation.
  • Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access.
  • Worked with application teams to install the operating system, Hadoop updates, patches, and version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive Jobs.
  • Application development using RDBMS, and Linux shell scripting.
  • Experience in installing, configuring, and optimizing Cloudera Hadoop version CDH 5.X in a Multi Clustered environment
  • Set up and configured TLS/SSL on Hadoop cluster ranging from different levels of 1, 2, and 3 using keystore and truststore.
  • Enable TLS on HDFS, Yarn and Hue services.
  • Set up KDC Kerberos trust between multiple Hadoop clusters and tested validated adding peers and business development representative (BDR) job set up.
  • HDFS encryptions on Hadoop cluster using KMS/ KEYTRUSTEE .

Confidential, Hauppauge, NY

Hadoop Admin

Responsibilities:

  • Deploying a hadoop cluster, maintaining a hadoop cluster, adding and removing nodes using cluster monitoring tools Cloudera Manager, configuring the NameNode high availability and keeping track of all the running hadoop jobs.
  • Monitoring for deciding the size of the hadoop cluster based on the data to be stored in HDFS.
  • Working with open source Apache Distribution as a hadoop admin have to manually set up all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site.
  • Handled data processing & incremental updates using Hive & process the data in hive table.
  • Develop and establish strong relationships with end users and technical resources throughout established projects.
  • Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Extensively worked in ETL process consisting of Data Analysis, Data Modeling, Data Cleansing, Data Transformations, Integration, Migration, Data import/Export, Mapping, and Conversion.
  • Enable TLS on HDFS, Yarn, and Hue services.
  • Exporting the data using Sqoop to RDBMS servers and processing that data for ETL operations.
  • Experience in writing UNIX Shell scripts for various purposes like file validation, automation of ETL process and job scheduling using Crontab.
  • Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
  • Scheduled many jobs to automate different database related activities including backup, monitoring database health, disk space, and backup verification.
  • Installing multi-node production Hadoop environments with Amazon AWS and AWS technologies such as EMR, EC2, IAM, S3, Data Pipeline etc.
  • Used tools for Export & Import data from different sources like SQL Server Database, Flat file, CSV, Excel and many other data sources support ODBC, OLE DB Data Sources.
  • Involved in loading data from LINUX and UNIX file system to HDFS.
  • Handled data processing & incremental updates using Hive & process the data in hive table.
  • Implementing, managing and administering the overall hadoop infrastructure.
  • Takes care of the day-to-day running of Hadoop clusters.
  • Importing and Exporting data using Sqoop from HDFS TO RDBMS and vice-versa.
  • Created a local YUM repository for installing and updating packages. Configured and deployed hive metastore using MySQL and thrift server.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Involved in the regular Hadoop Cluster maintenance, such as patching security holes and updating system packages.

Confidential, Princeton, NJ

Hadoop Developer

Responsibilities:

  • Working experience in designing and implementing complete end-to-end Hadoop Infrastructure, including HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie and Zookeeper.
  • Deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, HCATALOG, HBASE, ZOOKEEPER) usingHortonworks Ambari.
  • Designed, planned and delivered a proof of concept and business function/division based implementation of a Big Data roadmap and strategy project.
  • Adding/installation of new components and removal of the same through Ambari.
  • Created tables, loaded data, and wrote queries in Hive.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Monitor and manage comprehensive data security across the Hadoop platform using Apache Ranger.
  • Involved in exporting the analyzed data to the databases such as MySQL and Oracle, used Sqoop for visualization and generated reports for the BI team.
  • For building high performance batch and interactive data processing applications coordinated by YARN in Hadoop, use Apache Tez.
  • Worked on an Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner
  • Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC connector.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
  • Transformed the data using Hive, Pig for BI team to perform visual analytics according to the client requirement.
  • Developed scripts and automated data management from end to end and synced up b/w all the Clusters
  • Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.

Confidential, St. Louis, Missouri

Linux Administrator

Responsibilities:

  • Installation, Configuration, Management and Maintenance over Linux System REDHAT 6.
  • Manage all internet applications inclusive to DNS, RADIUS, Apache, MySQL, PHP. Taking frequent back up of data, create new storage procedures and scheduled backup is one of the duties.
  • Adding & deleting Users, Groups & Others. Changing Groupers & ownership, implementing ACL’s.
  • Involved in loading data from LINUX and UNIX file system to HDFS.
  • Installation, Management, and Configuration of LAN/WAN systems utilizing Cisco switches and routers.
  • Package management using RPM & YUM. Disk Quota Management.
  • Configuring and maintaining common Linux application services such as NFS, DHCP, BIND, SSH, HTTP, HTTPS,, FTP,LDAP, SMTP, MySQL and LAMP.
  • Building & configuring Red Hat Linux systems over the network, implementing automated tasks through crontab, and resolving tickets according to the priority basis.
  • Performed backup management using tap drivers, and manual commands (tar, scp, rsync) for remote server storage.
  • Experience in writing UNIX Shell scripts for various purposes like file validation, automation of ETL process and job scheduling using Crontab.
  • Modernized security policy for Red Hat servers.
  • Worked with Linux-friendly applications and was able to troubleshoot them when the issue arose from the server.

We'd love your feedback!