- Over 8+ years of IT experience including 6 years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop ecosystem components in the existing cluster.
- Over 3+ years of LINUX System administration in Sun Solaris (9/10), HP - UX (10.x,11.x), IBM AIX (5.x), Linux (RedHat 4/5/6).
- Expertise in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4, and CDH5) distributions and Hortonworks (HDP).
- Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
- File system management and monitoring, HDFS support and maintenance.
- Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Yarn, impala, spark, Scala, Vertica, Oracle, MS-Sql.
- Mastered major Hadoop distributes like Hortonworks and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
- In-depth knowledge of Hadoop Eco system - HDFS, Yarn, MapReduce, Hive, Hue, Sqoop, Flume, Kafka, Spark, Oozie, NiFi and Cassandra.
- Troubleshoot read/write latency and timeout issues in CASSANDRA
- Hands-on Experience as a Linux/Unix Sys Admin, Hadoop Platform Support (Hadoop Admin).
- Administration and troubleshooting Red Hat Cluster Suites.
- Experience with appropriate process improvement tools and techniques such as Six Sigma, Lean Sigma, and ITIL.
- Experience with installing, configuring and monitoring Apache Cassandra cluster.
- Experience working with MapR & Hortonworks Distribution of Hadoop.
- Supported Websphere Application Server WPS, IBM HTTP/ Apache Web Servers in Linux environment for various projects.
- Experience in Hadoop Administration (HDFS, MAPREDUCE, HIVE, PIG, SQOOP, FLUME, STORM, OOZIE, IMPALA and HBASE) and NoSQL Administration.
- Expert knowledge of Linux (Redhat, Oracle Linux)
- Extensive experience NoSQL databases including Hbase, Cassandra and MongoDB.
- Good experience in analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Expertise in database performance tuning & data modeling.
- Experience in Operational Intelligence using Splunk.
- Expertise in working with different databases likes Oracle, MS-SQL Server, Postgres, and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
- Involved in design and architecture of Enterprise grade technologies associated with Docker.
- Experience in performing backup, recovery, failover and DR practices on multiple platforms.
- Implemented Kerberos, Ranger and Sentry for authenticating all the services in Hadoop Cluster.
- Experience with automation for provisioning system resources using puppet.
- Experience with scripting skills (bash, Python, PERL, etc.)
- Strong knowledge in configuring NameNode High Availability and NameNode Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, Sqoop automation.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Experience in production support and application support by fixing bugs.
- Experience in designing and developing real time big data processing applications using Kafka, Storm, HBase, Hive, Spark.
Big Data Technologies: Hadoop, HDFS, Hive, Cassandra, Pig, Scoop, Falcon, Flume, Zookeeper, Yarn, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4.
Distributions: Cloudera, Hortonworks, MapR
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Testing: Capybara, WebDriver Testing Framework, RSpec, Cucumber, Junit, SVN
Server: WEBrick, Thin, Unicorn, Apache, AWS
Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8/10)
Database & NoSql: Database Systems Oracle 11g/10g, DB2, SQL, My SQL, HBASE, MongoDB, Cassandra
Scripting & security: Shell Scripting, HTML Scripting, Python, Kerberos, Dockors
Security: Kerberos, Ranger, Sentry
Other tools: Redmine, Bugzilla, JIRA, Agile SCRUM, SDLC Waterfall.
Confidential, San Jose, CA
Sr. Hadoop Administrator
- Installed and configured Cloudera CDH 5 with Hadoop Eco-Systems like Hive, Oozie, Hue, Spark, kafka, HBase, Yarn.
- Automated all the jobs, for pulling data from relational databases to load data into Hive tables, using Oozie workflows and enabled email alerts on any failure cases
- Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Setting up different FLUME agents based on the team requirements to bring the log files from different source systems.
- Relational and NOSQL databases.
- Optimized the full text search function by connecting Mongo DB and Elasticsearch.
- Utilized AWS framework for content storage and Elasticsearch for document search.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Sqoop, Spark, Oozie and Impala with Cloudera distribution.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution Build Apache Kafka Multinode Cluster and used Kafka Manager to monitor multiple Clusters.
- Imported the analyzed data to the HDFS using Sqoop
- Administer Splunk as well as create, test and deploying operational search strings.
- Experienced in LDAP integration with hadoop and Access provisioning for secured cluster.
- Installation and configuration of Linux for new build environment.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Creating Pipeline between Cloudera (CDH) and AWS S3.
- Manage and review Hadoop log files and Log cases with Cloudera Manager.
- Familiarity with building lambda architecture using Apache Kafka, Spark servers, and environments.
- Installation of SPARK, STORM, KAFKA and configuring it as per the requirements.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster node.
- Done major and minor upgrades to the Hadoop cluster.
- Monitoring the System activity, Performance, Resource utilization.
- Configured High Availability on the namenode for the Hadoop cluster - part of the disaster recovery roadmap.
- I am a mix of Devops and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
- Configured Ganglia and Nagios to monitor the cluster and on-call with EOC for support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to the HDFS.
- Tested and Performed enterprise wide installation, configuration and support for hadoop using MapR Distribution.
- Configured Domain Name System (DNS) for hostname to IP resolution.
- Setting up cluster and installing all the ecosystem components through MapR and manually through command line in Lab Cluster.
- Instrumental in building scalable distributed data solutions using Hadoop ecosystem.
- Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
Environment: Cloudera, CDH 4/5, Ambari 2.2, HDFS, MapReduce, NIFI, Python, Yarn, Hive, PIG, Zookeeper, TEZ, MongoDB, MYSQL, and Centos 6.6.
Confidential, Columbus, OH
Sr. Hadoop/Hortonworks Admin
- Responsible for day-to-day activities which includes HDFS support and maintenance, Cluster maintenance, commissioning/decommissioning of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup and restoring, capacity planning.
- Evaluate technical aspects of any change requests pertaining to the Cluster.
- Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elasticsearch, Tableau, GoCD, Red hat infrastructure for data ingestion, processing, and storage.
- Load and transform large sets of structured, semi structured and unstructured data even joins and some pre-aggregations before storing data into HDFS.
- Worked with Hadoop developers and operating system admins in designing scalable supportable infrastructure for Hadoop.
- Experience in Encryption at rest using Key Trustee Server, Enabling encryption over the wire ( TLS) and also implemented Kafka Security Features using TLS.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Installed and configured multi-nodes, multi-datacentre fully distributed Cassandra cluster.
- Designed a Solr (Cloudera Search) index pipeline using the Lily Indexer in both batch and service (near real-time) modes. The source of this index will be the MNsure Audit HBase environment.
- HDFS support and maintenance.
- Worked closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.
- Splunk implementation, planning, customization, integration with Application servers, big data and statistical and analytical modeling.
- Familiar with implementing Spark, Kerberos authorization / authentication, LDAP and understanding of cluster security
- Refreshing data between the cluster as per needs
- Manage and review Hadoop log files.
- Setup Elastic Load Balancer for distributing traffic among multiple servers for high availability.
- File system management and monitoring.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks, MapR.
- Responsible onboarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately. packages.
- Developed Storm topologies to read data from kafka topics, populated staging tables and stored the refined data in partitioned hive tables in Google Cloud Storage.
- Used AWS (Amazon Web Services) Cloud computing EC2 for provisioning like new instance (VM) creation. Designed architecture based on Client’s requirements including Hadoop, HBase, Solr.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Implemented Spark solution to generate reports from Cassandra data
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Upgraded from HDP 2.2 to HDP 2.3 Manually in Software patches and upgrades.
- Kafka- Used for building real-time data pipelines between clusters.
- Helped business in Connecting hadoop with the existing tools like tableau, Cognos, SSRS reporting and SAS tools, specially Informatica Big data Edition to process the data in easiest way.
- Responsible for building scalable distributed data solutions using Hadoop.
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
Environment: Hadoop, Hortonworks, Hive, python, Solr, MapReduce, kafka, MapR, Amazon Web Services (AWS), Ansible, S3, NoSQL, HDFS, UNIX, Redhat and CentOS.
Confidential, San Mateo, CA
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed and configured Cloudera distributions on single node clusters for POCs..
- Implement Flume, Spark, Spark Stream framework for real time data processing. Developed analytical components using Scala, Spark and Spark Stream. Implemented Proofs of Concept on Hadoop and Spark stack and different big data analytic tools, using Spark SQL as an alternative to Impala .
- Worked in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files. Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
- Experience on Mapr patching and upgrading the cluster with proper strategies.
- Installation and configuration of Solaris 9/10 and Red Hat Enterprise Linux 5/6 systems.
- Lead for WebSphere Portal 6.1.5/126.96.36.199 migrations on AIX environment and WebSphere Content Rules Engine setup on Windows 2003 server environment.
- Clustered the Application Server environment using horizontal clustering across multiple boxes to facilitate high availability, failover support and load balancing in a production environment.
- Installed Kerberos secured kafka cluster with no encryption on POC vms also set up Kafka ACL's.
- Created data transformation tasks like BCP, BULK INSERT to import/export data from client.
- Involved in implementing security on cloudera Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Worked on SQL Server 2005, 2008r2 Database Administration.
- Managed the migration of SQL Server 2005 databases to SQL Server 2008.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job performance and capacity planning using MapR control systems.
- Worked on migrating data from Oracle to Cassandra databases.
- Led "Devops" initiative to create automated infrastructure and deployment capabilities for cloud- and hadoop- based solutions.
- Utilize cloud computing software's such as the Google Cloud Platform
- Expertise in collaborating with application teams to install the operating system and Hadoop updates, patches, version upgrades when required.
- Build and rack new server equipment. Setup, configure and maintain Atlassian suite (Jira, Confluence, Fisheye) application servers
- Install company’s first Puppet system and wrote configuration manifests.
- Fixed performance tuning issues on Cassandra cluster.
- Use Bash shell scripts to automate repetitive and bulk tasks. responsible for spinning up any one off servers we might have needed such as backup DNS, a central logging server and Dashing (metrics) server.
- Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases up to 2.5 Terabytes.
- Configured fair scheduler to provide service level agreements for multiple users of a cluster.
- Worked on analyzing Data with HIVE and PIG.
- Involved in building servers using jumpstart and kickstart in Solaris and RHEL respectively.
Environment: Hadoop, CDH 3, Pig, HBase, Zookeeper, Sqoop, ETL, AWS, ETL, Ansible, Data warehousing, Impala, Ambari 2.0, Linux CentOS, HBase, MapR, Cloudera, Puppet, Ambari, Kafka, Ganglia, Agile/scrum.
- Worked on the ~350*2 and ~200*2 node production cluster and also for various environments like DEV, QA, data centers, which is the backend infrastructure for the storage of Transactional data.
- Implemented capacity schedulers on the Job Tracker to share the resources of the cluster for the MapReduce jobs given by the users.
- Involved in the process of Cassandra data modelling and building efficient data structure
- Deployed and installed new servers and their appropriate services for various applications in Linux.
- Worked in setting up LDAP, DNS, DHCP Server along with effective group and System Level policies and roaming profile features by using Samba and NFS servers.
- Working on the Change requests and involved in Construction phase.
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
- Worked on automation for provisioning system resources using puppet.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Developed and Worked in EA Framework. Writing Test Cases from Function Specifications.
- Helped develop MapReduce programs and define job flows.
- Created POC on Cloudera and suggested the best practice in terms CDH, HDF platform, NIFI
- Handling Status Calls to explain the progress of defects. Providing reports and performing Cleanups requested by client.
- Planned, scheduled and Implemented OS patches on Linux boxes as a part of proactive maintenance.
- Experience in installing, integrating, tuning, and troubleshooting Tomcat WebSphere and WebLogic application server.
- Installation and configuration of MySQL on Red Hat Linux cluster nodes.
- Troubleshooting the system and end user issues.
- Responsible for configuring real time backup of web servers.
- Log file was managed for troubleshooting and probable errors.
- Responsible for reviewing all open tickets, resolve and close any existing tickets.
- Document solutions for any issues that have not been discovered previously.
- Archive management for files and directories employing the TAR tool and compression of files using the gzip and bzip2
- Administered local and remote servers using SSH (secure shell) tool on daily basis.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smartsense, Storm, Kafka.
- Setup and optimize Standalone Clusters, Pseudo-Distributed Clusters, and Distributed Clusters.
- Experience on BI reporting with atScale OLAP for Big Data.
- Design of dimensions, fact tables, and cubes necessary for automated generation of business intelligence dashboard and report objects.
Environment: Hadoop Hdfs, Cloudera, Yarn, HBase, Linux, Solr, Oozie, Flume, Splunk, EDB PostgreSQL, Tomcat Servers, RabbitMQ, EMC Isilon Server
- Installation, configuration and maintenance Red Hat Enterprise Linux 5.
- Installed Operating System and configured kernel parameters for Linux/Unix servers.
- Implemented Linux Wyse terminals onto production floor
- Configured server, IIS, and server side client configuration
- Coordinating with storage team and application team to resolve application issues.
- Creating new filesystem ext4, NFS, GFS2 and checking for data consistency.
- Housekeeping linux/unix systems
- Worked on creating, maintaining and troubleshooting Disk partitioning, LVM and file management.
- Good understanding of Operating Systems (Unix/Linux), Networks, and System Administration experience
- Worked on configuring TCP/IP, network interface, assigning static routes and hostnames.
- Created shell scripts for automating the daily maintenance and update processes in the production environment.
- Maintain multiple system tables to ensure optimum database performance & data integrity for O.R. applications.
- Installation of Windows Operating systems XP, Vista, Win7.
- Configure and troubleshoot Microsoft Exchange and Outlook user accounts.
- Configuration of routers, switches, Wi-Fi connection etc.
- Installation of, Network printer, switches and Router also taking care of trouble shooting.
- Installation of Windows Operating systems XP, Vista, Windows 7, 2003 server, 2008 R2 server.
- Participate as system super user in multiple application implementations, system upgrades, and enhancements.
- Serve as department charge master analyst
- Work closely with revenue strategies to assure department chargeable supplies/implants are identified.
- Involved in installation and configuration of Linux / UNIX Servers and Workstations distributions include Red Hat.
- Responsible for installation, configuration, and administration of Red Hat Linux, Solaris systems.
- Monitoring and administering Sun-Solaris servers and Red hat Linux servers.
- Provided VPN connectivity support.
- Asset management and redeployment.
- Worked on Puppet configuration management tool
- Installation of Windows Operating systems XP, Vista, Win7.
- Configure and troubleshoot Microsoft Exchange and Outlook user accounts.
- Configuration of routers, switches, Wi-Fi connection etc.
- Assist queries on internet services and Microsoft outlook email configurations.
- Setup and maintained NFS, NIS, and TCP/IP network, configured the systems for TCP/IP Networking with the existing LAN, setting up SSH and SCP features between SUN systems and other Red Hat/UNIX hosts.
- Responsible for installing, configuration, and administration for UNIX/Sun Solaris Servers.
- Applying system package installations and upgrades.
- Outlined implemented and managed infrastructure of WebLogic.
- Configuring, Administering and Management of Users and Groups.
- Responsible for process management in Solaris operating environment.
Environment: RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Windows 2000/2003 Unix, Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, Toad