- 8+ years of IT industrial experience in developing Database management, Administrating Linux, developing Map - reduce applications, designing, building and administrating large scale Hadoop production Clusters.
- 4+ years of experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Hcatalog, Sqoop, Zookeeper, NoSQL
- Solid background in Unix and Linux Network Programming
- Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, HCATALOG, HBASE, ZOOKEEPER) using Cloudera Manager and Hortonworks Ambari.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Experience in designing and implementing HDFS access controls, directory and file permissions user authorization that facilitates stable, secure access for multiple users in a large multi tenant cluster
- Strong knowledge in configuring NameNode High Availability and NameNode Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
- Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
- Experience in installing and administering PXE Server with kick start, setting up FTP, DHCP, DNS servers and Logical Volume Management.
- Exposure to Maven/Ant, GITalong with Shell Scripting for Build & Deployment Process.
- Experience in handling multiple relational databases: MySQL, SQL Server, ORACLE
- Rack aware configuration for quick availability and processing of data.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytabtools.
- Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
Big Data Ecosystems: Hadoop, MapReduce, HDFS, YARN, Hive, Pig, Oozie, Zookeeper, Sqoop, Storm
Operating Systems:: Windows, Linux (Redhat, CentOS, Ubuntu)
UNIX repositories: Apache, Yum, RPM
Databases: MySQL, Oracle DB
Version Cocntrols: SVN,GIT, Microsoft Visual Source Safe, Changeman and Rational Clear Case
Cluster management tools:: Cloudera Manager, Ambari, Sitescope, Nagios, Ganglia
Scripting languages: Shell scripting
ETL Tools: Informatica, SSIS
Protocols: TCP/IP, HTTP, HTTPS, TELNET, FTP and LDAP
BI Reporting Tools: Tableau, Microstrategy
Sr. Hadoop Administrator
- Managed 150+ Nodes CDH 5.8.2 cluster with 2 petabytes of data using CM 5.8.3 and Linux Cent OS 6.5.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Conducting RCA to find out data issues and resolve production problems.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows. Also Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS
- Building a data pipeline using XML parser and making the parsed xml data to Consumers.
- Worked on migrating from hive actions to Spark Sql and using Data Frames.
- Worked on Hive optimization techniques to improve the performance of long running jobs.
- Identified several Bugs in CDH 5.8.2 and stood as first in finding those issues.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Experience on JIRA and ServiceNow to track issues on the big data platform .
- Experienced in managing and reviewing Hadoop log files
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive .
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes .
- Experience on Hbase High availability and manually tested using failover tests .
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Experience in upgrading the cluster to newer versions of CDH 5.8.2 and CM 5.8.3
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Coordinated with technical teams for installation of Hadoop and third related applications on systems.
- Supported technical team members for automation, installation and configuration tasks.
- Suggested improvement processes for all process automation scripts and tasks.
- Assisted in designing, development and architecture of Hadoop and Hbase systems.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Responsible for cluster Maintenance , Monitoring, Troubleshooting, Tuning, commissioning and Decommissioning of nodes.
- Responsible for cluster availability and experienced on ON-call support
- Involved in Analyzing system failures , identifying root causes , and recommended course of actions. Documented the systems processes and procedures for future references.
Environment: CDH 5.8.2,Hadoop 2.5.0, Map Reduce 2.0 (YARN) HDFS, Hive 0.13, Hue 3.7.0, Pig 0.14.0, Hbase, Spark,Scala,Jenkins,Sonar,C,RDBMS, Oracle 11g/10g, Oozie, Java (jdk1.6), UNIX, GIT, Zookeeper, Gradle, Python, Tableau.
Confidential, Atlanta, GA
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
- Launching and Setup of HADOOP Cluster on physical servers, which includes configuring different components of HADOOP.
- Created a local YUM repository for installing and updating packages.
- Responsible for building system that ingests terabytes of data per day into Hadoop from a variety of data sources providing high storage efficiency and optimized layout for analytics.
- Developed data pipelines that ingests data from multiple data sources and process them.
- Expertise in Using Sqoop to connect to the ORACLE, MySQL, SQL Server, TERADATA and move the pivoted data to Hive tables or Hbase tables.
- Implemented Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Worked on SAS migration to Hadoop on Fraud Analytics and provided predictive analysis.
- Developed multiple Map Reduce jobs in java for data cleansing and preprocessing.
- Configured Kerberos for authentication, Knox for perimeter security and Ranger for granular access in the cluster.
- Configured and installed several Hadoop clusters in both physical machines as well as the AWS cloud for POCs.
- Configured and deployed hive metastore using MySQL and thrift server.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Extensively used Sqoop to move the data from relational databases to HDFS.
- Used Flume to move the data from web logs onto HDFS.
- Used Pig to apply transformations validations, cleaning and deduplication of data from raw data sources.
- Integrated schedulers Tidal and Control-M with the Hadoop clusters to schedule the jobs and dependencies on the cluster.
- Worked closely with the Continuous Integration team to setup tools like Github, Jenkins and Nexus for scheduling automatic deployments of new or existing code.
- Actively monitored the Hadoop Cluster of 320 Nodes with Hortonworks distribution with HDP 2.4.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
- Worked on performing minor upgrade from HDP 2.2.2 to HDP 2.2.4
- Upgraded the Hadoop cluster from HDP 2.2 to HDP 2.4 and HDP 2.4 to HDP 2.5
- Integrated BI tool Tableau to run visualizations over the data.
- Solving hardware related Issues Ticket assessment on daily basis.
- Automate administration tasks through the use of scripting and Job Scheduling using CRON.
- Provided 24 x 7 on call support as part of a scheduled rotation with other team members
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, OOZIE, SQOOP, AMBARI, STORM, AWS S3, EC2, IDENTITY ACCESS MANGEMENT, ZOOKEEPER, NIFI
Confidential, Boise, ID
- Configured in operating system level includes resolving DNS Resolution, user accounts and file permissions, networking, SSH password less login.
- Created LVM partitions on Linux Servers and mounted file systems on partitions.
- Used Nagios to monitor the daemons and the cluster status, using custom monitoring scripts.
- Import and export data from RDBMS (Oracle, MySQL) to HDFS using Sqoop.
- Manage the day to day operations of the cluster for backup and support.
- Performed operating system installation, Hadoop version updates using deployment tools like chef, puppet.
- Implemented Kerberos on cluster for authenticating all the services.
- Deployed NFS for Name Node Metadata backup.
- Implemented Fair schedulers to share the resources of the cluster for the map.
- Configured Ganglia including the daemons of GMOND and GMETAD which collects all the metrics running on the distributed cluster and visualize them in real-time dynamic webpages which would further help in debugging and maintenance.
- Implemented Rack Topology on the Hadoop cluster.
- Regular Commissioning and Decommissioning of nodes depending upon the data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
- Custom shell scripts for automating redundant tasks on the cluster.
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot
- Responsible for maintenance Raid-Groups , LUN Assignments as per agreed design documents. Performed all System administration tasks like cron jobs , installing packages, and patches .
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Configured Domain Name System (DNS) for hostname to IP resolution
- Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours
Environment: LINUX, HDFS, MAPREDUCE, KDC, NAGIOS, GANGLIA, OOZIE, SQOOP, Ambari .
Linux/Database System Engineer
- Involved in all stages of Software Development Life Cycle (SDLC).
- Participated in gathering business requirements and converting them into detailed design documents.
- Collaborated with product owner, QA, and other developers to maintain the existing web applications.
- Extensively used core Java concepts like Collections, Exception Handling, Generics and Multithreading during development of business logic.
- Designed and implemented business logic with Spring framework.
- Used XML for data exchange and schemas (XSDs) for XML validation. Used XSLT for transformation of XML using XML parsing.
- Created continuous integration builds using Maven and gradle.
- Written numerous test cases for unit testing of the code using JUnit testing framework.
- Wrote SQL queries to manipulate data in the database.
- Used JIRA for project management, tracking and monitoring errors and fixed the errors.
- Used GIT for code repository and version control.
- Used Sonar to monitor test cases coverage ratio.
- Fix the code review comments; Build the Jenkins and support for the code deployment into the production. Fix the postproduction defects to perform the code to work as expected.
- Prepared design documents from the requirements and mapping of the requirements to the design i.e., preparing/updating of Requirement Traceability Matrix (RTM).
- Managing the Live Defects (Vantive) and System Test defects (using Quality Centre).
- Creating and managing the defect reports.
- Prioritizing the current live defects and appeal to commit for a release.
- Prepared deployment plans for production deployments.
- Reviewing the Application Design, Unit Test Plan, Integration Test Plan, and Test Results to ensure to be compliant as per the ITUP process of the client.
- Performed code reviews to ensure code meets coding standards and promoted the code to higher environments.
- Participating is technical design walkthrough and test summary walkthrough’s.
- Active participation in the team meetings.