Hadoop Platform Adminstrator Resume
2.00/5 (Submit Your Rating)
SUMMARY
- Certified Hadoop Administrator with over 5 years of professional IT experience this includes strong experience in Hadoop administration using Cloudera Enterprise distribution and Cloudera Manager as well as Ambari and Hortonworks. Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Solr, Oozie, HBase and Zookeeper.
- Strong hold on Linux operating system to troubleshoot complex issues related to platform and hardware components.
- Hands on experience on major components in Hadoop Ecosystem including HDFS and MR framework, YARN, Hbase, Hive, Pig, Scoop, Zookeeper.
- Experience in working in a DevOps Environment on various technologies like ansible, GIT, Jenkins, AWS and Maven.
- Experience in performance tuning and capacity planning with large data sets and have used Apache Ambari to monitor the Hadoop ecosystem (YARN clearance, memory allocations, etc.).
- Worked on Hadoop Stack, ETL TOOLS like sparkFlow and Sqoop, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
- Strong knowledge on yarn terminology and the High - Availability Hadoop Clusters and working with F5 load balancers to make sure end user gets a steady environment.
- Expertise in Linux and Hadoop System Administration skills, networking and familiarity with open source configuration Management and deployment tools such as Ansible.
- Experience with enterprise scheduling software like CAWA (CA workload Automation) and BMC Control-M
TECHNICAL SKILLS
- Cloudera Enterprise 5.13,5.8,5.9 Hortonworks 2.6 and 3.1 (Current Version)
- Yarn, MapReduce, Spark 1.6,2.2, Pig, Hive, Impala, Sqoop, Phoenix
- Apache Hue, Ambari (HDP, HDF), Cloudera Manager (5.9,5.13) an
- Resource Manager, MIT-Kerberos, LDAP.
- GitHub, MAVEN, Bamboo (Deployment)
- Java, Python, Hive QL, SQL, Shell, SCI-KIT (Python).
- SOAP, Restful.
- Oracle 10g, DB2, MySQL, MS-SQL server, Amazon EC2.
- Apache NiFi, Streamsets
- JIRA, Eclipse/Net Beans, SQL Developer.
- AWS (EC2, S3), Microsoft Azure, UNIX, Windows XP / 7, Ubuntu (Linux), CentOS, RedHat Linux.
PROFESSIONAL EXPERIENCE
HADOOP PLATFORM ADMINSTRATOR
Confidential
Responsibilities:
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Impala, Zookeeper, Hue and Sqoop using both Cloudera and Hortonworks.
- Involved in major version upgrade HDP 2.6 to 3.1, Upgraded Ambari from 2.6 to 2.7.
- Wrote shell script to report/automate user quota and smoke tests for all the platforms.
- Enabled HA for Resource Manager, Name Node.
- Enabled Active Directory/LDAP for Amabri, Zeppelin and Knox.
- Expertise in setting up in-memory layer such as Spark (1.6 and 2.x), impala and its maintenance like resolving out of memory issues, balancing load across daemons.
- Worked on Hive LLAP interactive reporting for tableau.
- Worked on real time data processing with Spark.
- Provided guidance to users on re-writing their queries to improve performance and reduce cluster usage.
- Provided regular user and application support for highly complex issues involving multiple components such as Hive and Spark.
- Handling data import from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Mysql, Oracle into HDFS.
- Written scripts for automating the processes such as taking periodic backups, setting up user batch jobs.
- Implemented Fair schedulers on the Resource Manager to share the resources of the cluster for the Map Reduce jobs given by the users.
- Migrated data across clusters using DISTCP.
- Written scripts for disk monitoring and logs compression
- Involved in ongoing maintenance, support and improvement in Hadoop cluster.
HADOOP PLATFORM ADMINSTRATOR
Confidential, ATLANTA, GA
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure (LAB, DEV/SIT, PROD)
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Experience working on installing, configuring and Administering Hadoop Cluster (400 Node) using CDH.
- Usage of Cloudera Manager for the regular monitoring of node Status, Tuning performance, Live Report, YARN Memory clearance and so on.
- Deploy new hardware and software environment required for Hadoop and to expand memory and disks on node in the existing environments.
- Handle the data exchange between HDFS and different Web Applications and database using Sqoop and Flume.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with Streamsets application team to install, configure and test Data work flows.
- Developed a script to recycle streamsets data collector as per data engineer team.
- Used Linux Shell scripts for automating the process and implementing Impala for data Analysis.
- Implemented batch processing of data source using Apache Spark, includes execution of Spark RDD transformation and actions as per business analysis needs.
- Migrated Hive queries into Spark QL to improve performance and elaborated predictive analytics using Apache Spark Scala APIs.
- Linux users and testing HDFS, Hive, Pig and Map Reduce access for the new users.
- Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
HADOOP STACK ENGINEER
Confidential
Responsibilities:
- Wrote shell-scripts to monitor all the hadoop environments and scheduled those through bmc-control-M Workload manager.
- Deploy new hardware and software environment required for Hadoop and to expand memory and disks on node in the existing environments.
- Handle the data exchange between HDFS and different Web Applications and database using Sqoop and Flume.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Involved in Data Ingestion Process to Production cluster.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Configuration change management where changing the configuration properties of the cluster based on volume of the data being processed.
- Working with data delivery teams to setup new Hadoop users which also includes setting up.
- Bulk importing of data from various data sources into Hadoop and transform data in flexible ways by using Apache Nifi 0.2.1.
- Experience setting up Nifi with Hortonworks data platform and developing test work flow for Data Engineer team.
- Linux users and testing HDFS, Hive, Pig and Map Reduce access for the new users.
- Performing Linux systems administration on production and development servers (RedHat Linux, CentOS and other UNIX utilities).
- Job and user management using Capacity Scheduler.
- Installing Patches and packages on Unix/Linux Servers.
- Install and Configure VMware vSphere client, Virtual Server creation and resource allocation.
- Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
SOFTWARE ENGINEER
Confidential
Responsibilities:
- Started with POC on Cloudera Hadoop converting one small, medium, complex legacy system into Hadoop.
- Have hands on Experience on Horton Works services.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Experienced in loading data from UNIX file system to HDFS.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Integrate data from various sources into Hadoop and Move data from Hadoop to other databases using Sqoop import and Export.
- Possess a good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
- Use Cloudera manager to pull metrics on various cluster features like JVM, Running Map and reduce tasks.
- Backup configuration and Recovery from a Name node failure.
- Experienced in managing and reviewing Hadoop log files.
- Created user accounts and given users the access to the Hadoop cluster.
- Worked with application teams to install operating system and Hadoop updates, patches, version upgrades as required.
- Performance tuning of Impala jobs and resource management in clusters.