Hadoop Admin Resume
O Fallon, MO
SUMMARY
- Over 7+ years of experience in IT industry this includes 4+ years of proven experience in Hadoop Development and Administration using Cloudera and Hortonworks Distributions.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4, CDH5),Yarn distributions.
- Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MapReduce programming paradigm.
- Hands - on experience on major components in Hadoop Ecosystem including Hive, Sqoop, HBase and knowledge of Mapper/Reduce/HDFS Framework.
- Solid background in UNIX and Linux Network Programming.
- Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HDFS, Hive, Hue, Impala, Oozie, Solr, Spark, Sqoop, YARN, ZooKeeper) using Cloudera Manager and Hortonworks Ambari.
- Experience in working with Flumeto load the log data from multiple sources directly into HDFS.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Experienced in installation, configuration, supporting and monitoring 200+ node Hadoopcluster using Cloudera manager.
- Experience in designing and implementing HDFS access controls, directory and file permissions user authorization that facilitates stable, secure access for multiple users in a large multi-tenant cluster
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, sqoop\automation.
- As admin involved in Cluster maintenance, capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Involved in Adding/removing new nodes to an existing Hadoop cluster.
- Involved in bench markingHadoop/HBasecluster file systems various batch jobs and workloads
- Hands on experience in analyzingLog files for Hadoop and eco system servicesand finding root cause.
- Scheduling all Hadoop/Hive jobs using beeline.
- Rack aware configuration for quick availability and processing of data.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure-KDC server setup, creating realm/domain, managing principles, generation key tab file on each service and managing keytab using keytabtools.
- Actively participated in the daily Scrum calls, Sprint planning, Effort Estimation, Sprint review, Sprint Demo and Retrospective sessions.
- Experience in various software development life cycle like Waterfall and Agile methodologies.
- Hands on experience in Core Java, Servlets, JSP, JDBC, Struts, Hibernate, Tomcat, Glassfish.
- Good working experience using Eclipse, NetBeans IDE’s
- Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
TECHNICAL SKILLS
Hadoop/BigData Components: HDFS, Hue, Map Reduce, Hive, Sqoop, Spark, Impala, Oozie, YARN, Flume, Kafka, Pig, Zookeeper
NoSql Databases: HBase, Cassandra
Programming Language: Java, HTML
Database: PostgreSQL, Derby, MySQL, SQL Server
Scripting Languages: Shell Scripting, Puppet
Frameworks: MVC, Spring, Struts, Hibernate
IDE: NetBeans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office
Operating Systems: Linux(Redhat, CentOS, UBUNTU), Windows, Mac
WEB Servers: Apache Tomcat, JBOSS and Apache Http web server
Cluster Management Tools: Cloudera Manager and HDP Ambari
Virtualization Technologies: VMware vSphere, Citrix XenServer
PROFESSIONAL EXPERIENCE
Confidential, O'Fallon, MO
Hadoop Admin
Responsibilities:
- Managed 200+ Nodes CDH 5.13.1 Hadoop clusters with 14 petabytes of data using RHEL.
- Involved in start to end process ofHadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files.
- Monitoring services, architecture design and implementation of Hadoop deployment, configuration management.
- Experienced in define being job flows with Oozie.
- Experienced in managing and reviewing Hadoop log files.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Installation and configuration of Sqoop, Flume and Hbase
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As admin followed standard Back up policies to make sure the high availability of Cluster.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop Clusters.
- Installed and configured hue, Hive, Sqoop and Oozie on the Hadoop Cluster.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Involved in cluster migration and expansion
- Involved in Adding new nodes to an existing cluster.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in migrating Oozie database to Postgres due to Derby database deadlock issues.
- Involved in upgrades like CDH 5.11.1 to 5.12.1 and Spark 1.6 to Spark 2.0
- Involved in upgrde java 1.7 to java 1.8
- Involved in installing Kudu 1.4 in CDH 5.12.1.
- Involved in Cloudera Navigator access for auditing and viewing data.
- Responsible for cluster availability and experienced on ON-call support
- Coordinated with Cloudera support team through support portal to sort out the critical issues during upgrades.
Environment: Hadoop, HDFS, Hive, Hue, Zookeeper, Impala, Oozie, HBase, Sentry, Solr, Spark, Sqoop, Yarn (MR2 included) Oracle 11g with redhat, Cloudera CDH.
Confidential, Dearborn, MI
Hadoop Admin
Responsibilities:
- Managed 100+ Nodes HDP 2.2.4 cluster with 10 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
- Installed and configured Hortonworks Ambari for easy management of existing Hadoop cluster.
- Responsible for the design and implementation of a multi-datacenter Hadoop environment intended to support the analysis of large amounts of unstructured data along with ETL processing.
- Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
- Conducting RCA to find out data issues and resolve production problems.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Enabled Kerberos for Hadoop Cluster Authentication and integrate with active directory for managing users and application groups.
- Developed Sqoop jobs to extract data from RDBMS databases - Oracle and Teradata.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Worked with big data developers, designers and scientists in troubleshooting mapreduce job failures and issues with Hive and Sqoop.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on design and implementation, configuration, performance tuning of Hortonworks HDP 2.3 Cluster with High Availability and Ambari 2.2.
- Analyzing the Server logs for errors and exceptions, Jenkins Job - Builds - Scheduling and monitoring the console outputs.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Experienced in managing and reviewing Hadoop log files.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Experience on Hbase High availability and manually tested using failover tests.
- Create queues and allocated the cluster resources to provide the priority for jobs.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of Cluster metadata databases with cron jobs.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Coordinated with technical teams for installation of Hadoop and third related applications on systems.
- Supported technical team members for automation, installation and configuration tasks.
- Assisted in designing, development and architecture of Hadoop and Hbase systems.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Responsible for Cluster Maintenance, Monitoring, Troubleshooting, Tuning, commissioning andDecommissioning of nodes.
- Responsible for cluster availability and experienced on ON-call support
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
Environment: Hortonworks, Ambari, Hive, Pig, Sqoop, Zookeeper, Hbase, Knox, Spark, Yarn, MapReduce.
Confidential, Oregon
Hadoop Admin
Responsibilities:
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis.
- Fine tuning Hive jobs for optimized performance.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Implemented APACHE IMPALA for data processing on top of HIVE.
- Fine tuning Hive jobs for better performance.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nodes on CDH3Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Imported weblogs from the web servers into HDFS using Flume.
- Implemented test scripts to support test-driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Automated workflows using shellscripts to pull data from various databases into Hadoop.
- Responsible to manage data coming from different sources.
- Involved in loading data from UNIX file system to HDFS.
- Services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Installed oozie workflow engine to run multiple Hive and Pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Yarn,Zookeeper,Impala, Cloudera Manager.
Confidential, Columbus, Ohio
Hadoop Developer
Responsibilities:
- Developed several advanced MapReduce programs to process data files received.
- Developed Pig scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files into Hadoop.
- Usage of Sqoop to import data into HDFS from MySQL database and vice-versa.
- Developed Java programs to process huge JSON files received from marketing team to convert into format standardized for the application.
- Installed, configured and deployed data node hosts for Hadoop Cluster deployment.
- Installed various Hadoop ecosystems and Hadoop Daemons.
- Managed commissioning & decommissioning of data nodes.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Knowledge transfer sessions on the developed applications to colleagues.
- Involved in installation and configuration of Tableau Server.
Environment: Apache Hadoop, Apache Cassandra, Hive, Sqoop, Solr, Tomcat, Eclipse Kepler, SVN repository, Linux, Putty, WinSCP.
Confidential
Developer
Responsibilities:
- Developed the JSP pages as part of UI.
- Performed validations using Validator Plug-in.
- Developed the Control Logic as part of Action Classes
Environment: Core Java, Struts, Servlets, JSP, Hibernate, NetBeans, Tomcat.