Hadoop Administrator Resume
Buffalo, NY
PROFESSIONAL SUMMARY:
- Working as a successful IT professional from last 7 years and devoted myself in bigdata/hadoop ecosystem to achieve a prosperous and future focused career from last 5 years.
- Having professional experience of working on diverse bigdata platform like Cloudera and Hortonworks.
- Industry driven hands on experience with hadoop components like HDFS, YARN, Map Reduce, Impala, Hive, Pig, HBase, Sqoop, Hue, Oozie, Spark, zookeeper, Cloudera Manager Apache Ambari, Sentry, Apache Ranger, slor and so on.
- Hands on experience on installing a hadoop cluster from scratch in different flavors and versions like CDH 5.x, HDP 2.x and CDP 7.x and in various flavors of linux like VM, Centos and Red Hat.
- Experience in capacity planning, validating hardware and software requirements, building and configuring small to medium size cluster, managing and performance tuning of Hadoop cluster for different use cases.
- Experience to manually setup all the hadoop configuration file like core - site.xml, hdfs-site.xml, hadoop-env.sh, mapred-site.xml and yarn-site.xml
- Experience in configuring Namenode High Availability(HA) by configuring standby name need and service level High Availability for Hive metasore, hiveserver2 and yarn resource manager.
- Proper knowledge and documentation on installing a hadoop cluster using cloudera manager and using online and offline http servers using tarball and parcel.
- Installing hadoop components during installation or separately adding afterwards on demand.
- Configuring, managing and tuning hadoop components using CM and Ambari based on company use cases and environments like development, implementation, testing, production or DR.
- Design, configure and manage backup and disaster recovery using Data replication Snapshots, Cloudera BDR utilities.
- Experience in configuring external database to store metadata of cloudera manager, report manager, hue, oozie, hive metastore and sentry.
- Performance tuning of YARN, Hive and spark considering nature of jobs and resource availability.
- Worked on HDFS management using node commission/decommission, load balancer, Quota, hdfs commands, hdfs-site.xml, Cloudera Manager and user access control using HDFS Facl, Sentry and Ranger.
- Ensuring cluster security by deploying and managing authentication using LDAP and Kerberos. Troubleshooting kerberos related access issues, creating and managing local keytabs for service accounts using ktutil tools.
- Installing and managing python packages and their versions depending on project development and environmental boundaries such as using impyla package for hive when a cluster is TLS/SSL enabled.
- Deploying data at transit encryption using TLS/SSL, creating/using private, public and CA s and troubleshooting service/user access issues regarding TLS/SSL.
- Implementing cluster level high availability (HA) using Active/Standby namenode, quorum journal manager (journal nodes, zkfc) and zookeeper.
- Implementing service level load balancing and high availability for services like Hive, Yarn Resource Manager, Hive Metastore, Impala, Hue using CM and haproxy.
- Monitoring service performances and triaging bottlenecks using cloudera manager, ambari charts, dashboards and email alerts.
- Analyzing log files using diagnostics, service logs, project logs and figuring out root causes and triaging the issues with respective teams to remove project blockers and thus ensuring better client experience.
- Documenting deployment steps, defects/bugs specific to environments and triaging with cloudera teams to make the cluster more stable and optimized and updating respective wiki pages.
- Responsible for on boarding applications on to the Hadoop infrastructure, Dev, Production and Test environments, ensuring access assets are being created.
- Performed Production Support for any problem leading to acceptable resolution, including day time, night time, and weekend support if required.
- Experience in Implementing Rack Awareness for data locality optimization.
- Hands on experience in managing cluster resource using yarn components like resource manager, resource pool and scheduler (Capacity, FIFO, Fair and DFR).
- Highly Experience to setup all the hadoop configuration file loke core-site.xml, hdfs-site.xml, hadoop-env.sh, maperd-site.xml and yarn-site.xml.
- Participated on upgrading CDH and CM major and minor versions and patching.
- Experienced on scheduling CORNTAB jobs in Linux for cluster maintenance.
- Created Hive External tables and loaded the data in to tables and query data using HQL, Impala, Hue and 3rd party tools like Dbvisualizer, Dbeaver, oracle sql developer and so on.
- Excellent experience in importing and exporting data using Sqoop from HDFS and hive to Relational Database Systems.
- Developed data pipelines using talend, spark that ingests data from multiple data sources and process them in hive or spark.
- Expertise in Big-Data technologies like Spark Ecosystem (Spark RDD, data frame and dataset).
- Involved with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
- Deployed workflows in Oozie, airflow, Control-M and assisted BAs to run the dags and triaging dag related issues with respective maintenance teams on daily basis.
- Knowledgeable on different cloud based environments like Amazon AWS, Microsoft Azure, Google GCP and services like S3, EC2, VPN, EMR, Databricks, Hdinsight etc.
- Good knowledge in Object Oriented Analysis and Design and Agile (SCRUM) Methodologies.
- Workable knowledge on Python, Shell script and Java to run and troubleshot various project related jobs.
- Knowledge of GIT, Bitbucket, Artifactory, JIRA and Jenkins tools functionality.
WORK EXPERIENCE:
HADOOP ADMINISTRATOR
Confidential, Buffalo, NY
Responsibilities:
- Extensively involved in installation and configuration of Cloudera distributed Hadoop and its components like Name Node, Standby Name Node, Resource Manager, Node manager and Data node.
- Responsible for performance testing and benchmarking of the cluster.
- Working with Hadoop Ecosystem components like HDFS Framework, HBase, Sqoop, Zookeeper, Oozie and Hive with Cloudera Hadoop distribution.
- Implemented Namenode HA in test & dev environments to avoid single point of failure within the environments.
- User management using authentication and data access management using authorization.
- Involved in Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce; loaded data into HDFS and Extracted the data from RDMBS into HDFS using Sqoop.
- Involved in Cluster coordination services through Zookeeper and adding new nodes to an existing cluster.
- Used Spark with Yarn and got performance results compared with MapReduce and used Hdfs to store the analyzed and processed data for scalability.
- Comfortable in RDD creation, converting them into Data frame, Dataset and save the output in hive.
- Ensuring cluster security by deploying and managing authentication using LDAP and Kerberos. Troubleshooting kerberos related access issues, creating and managing local keytabs for service accounts using ktutil tools
- Involved in balancing the loads on data nodes and different services as well as tuning of nodes for optimal performance of the cluster.
- Experience analyzes errors, configurations and find out root cause to minimize future system failure.
- Configuring and troubleshooting job performances and capacity planning.
- Support application teams on choosing the right file formats in Hadoop file systems like Text, Avro, Parquet and compression techniques such as Snappy, bz2, LZO.
- Involved on supporting data analysis projects using Elastic MapReduce on the Amazon Web Service.
- Experienced to install CDP cluster in Amazon web services, Created EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple storage service).
- Worked on importing data from multiple data sources to S3/AWS,
Environment: Cloudera CDH 5.x, CDP 7.x, Apache Yarn, Mapreduce, HDFS, Impala, HBase, Hive, Hue, Sqoop, Oozie, Talend, Zookeeper, Spark, Kerberose, Sentry, TLS/SSL, LDAP, Shell Scripting, Puppet, Jenkins, Agile, JIRA, MySQL, AWS (S3, EC2, IAM,EMR), Git, Bitbucket, NIFI, Kafka, Airflow.
HADOOP ADMINISTRATOR
Confidential, Boston, MA
Responsibilities:
- Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios and involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
- Involved in installation and configuration of Cloudera distributed Hadoop and its component like Name Node, Standby Name Node, Edge Node and Data Node
- Responsible for Benchmarking and tuning hdfs, hive, yarn, impala and spark for optimal performance.
- Experience working on LDAP user and configuring LDAP authentication and sync those user to the kerberose.
- Authenticated new users, authorized on database, table and file level using kerberos, sentry and FACL.
- Extensively using Cloudera manager for managing multiple clusters with Terabytes of data.
- Responsible for building scalable Hadoop cluster environments.
- Experienced linux administration including users creation, disks addition, cluster management using CLI.
- Authorized users for different database object using HUE.
- Regular maintenance of commissioning and decommissioning nodes as disk failures occur..
- Experienced to manage and review Hadoop log files.
- Performed to Export analytical data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Maintaining Github repositories for Configuration Management.
- Managing cluster coordination services through Zoo Keeper.
- Securing cluster by implementing SSL/TLS, LDAP and Kerberos.
- Assisted to set up production cluster and a DR cluster for backup.
- Built high availability for production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and quorum Journal Manager .
- Involved in the development of Spark SQL for various data processing.
- Imported the data from different sources like HDFS/MYSQL into SparkRDD.
Environment: Cloudera, CDH 5.x Apache Yarn, Mapreduce, HDFS, Impala, HBase, Hive, Hue, Sqoop, Oozie, Zookeeper, Spark, Kerberose, Sentry, TLS/SSL, LDAP, Shell ScriptingBIGDATA CONSULTANT
Confidential, Boston,MA
LINUX ADMINISTRATOR
Responsibilities:
- Deployment of Hadoop cluster in Hortonworks HDP environment.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files
- Installed/Configured/Maintained Hadoop tools like Hive, Pig, Spark, Impala, Zookeeper, Hue and Sqoop using Apache Ambari.
- Involved in Upgrading the HDP components and Ambari.
- Enabled HA for Resource Manager, Name Node, Hive Metastore.
- Enabled load balancer for hive to distribute work load on all hive instances across the cluster.
- Provided guidance to users on re-writing their queries to improve performance and reduce cluster usage.
- Provided regular user and application support for highly complex issues involving multiple components such as Hive, MapReduce
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Handling data import from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Mysql into HDFS.
- Used sqoop to import and export data from RDBMS to HDFS and vice-versa.
- Implemented Fair schedulers on the Resource Manager to share the resources of the cluster for the Map Reduce jobs given by the users.
- Migrated data across clusters to cluster using SFTP.
- Scheduling Workflows using Oozie.
- Involved in ongoing maintenance, support and improvement in Hadoop cluster.
Environment: Hortonworks HDP, HDFS, Map Reduce, YARN, Ambari, Pig, Hive, Sqoop, Oozie, Zookeeper, Kerberos, Aache Ranger, JIRA.
LINUX ADMINISTRATOR
Confidential, New York City, NY
Responsibilities:
- Involved in configuration and troubleshooting of multipathing Solaris and bonding on Linux Servers
- Configured DNS Servers and Clients, and involved in troubleshooting DNS issues
- Worked with Red Hat Linux tools like RPM to install packages and patches for Red Hat Linux server
- System Monitoring and log management on UNIX and Linux Servers; including, crash and swap management, with password recovery and performance tuning.
- Responsible for managing the functionality and efficiency of Linux and UNIX based Operating systems
- Troubleshooting system and network problems to ensure security, diagnosing and solving hardware or software faults & Replacing parts as required.
- Performed network and system troubleshooting activities under day to day responsibilities.
- Monitor the availability and health of the infrastructure and applications across multiple data centers.
- Linux administrator for RedHat systems (Systems resources: disk space, memory, CPU load)
- User and Security administration; addition of User into the System and Password management.
- Working with Clusters; adding multiple IP addresses to a Server via virtual network interface in order to minimize network traffic (load-balancing and failover clusters).