Service Reliability Engineer Resume
Stamford, ConnecticuT
PROFESSIONAL SUMMARY
- Over 7+ years of IT industry experience as a SRE, System Administrator, and Production Support of applications on Red Hat Enterprise Linux, and Windows environments.
- Over 5+ years of hands on experience supporting production environment as an Service Reliability Engineer for Corporate Applications.
- Experience in DevOps environment including Puppet, Ansible, Jenkins, Gitlab, GitHub.
- Skillful inMirroring and striping,Parity,New File Systems,MountingandUnmountingFile Systems.
- Experience with creation and administration of virtual machines on VMWare ESXi and Infrastructure 3.
- Experience in Creation and managing user accounts, security, rights, disk space and process monitoring in Solaris, RedHat Linux and Ubuntu.
- Skillful in installation and upgradation of Packages and Patches configuration management, version control, service pack, and reviewing connectivity issue regarding security problem.
- Experience in Creating and maintaininguser accounts, profiles,security, rights, disk spaceandprocess monitoring.
- Experience in Package management using RPM, YUM, and UP2DATE in Red Hat Linux.
- Skillful in network administration, deploying and troubleshooting of DNS, LDAP, NIS, NFS, and DHCP etc.
- On hand experience with cloud computing platforms likeAmazon Web Services (AWS), AzureandGoogle Cloud (GCP).
- Experience configuring and managing Puppet master server including update and create modules and push them to puppet clients.
- Proficiency in writing automation scripts using Puppet, Ansible, Shell, Chef and continuous deployment.
- Experience with configuration management tools like Puppet, Ansible integrated CI/CD and version control systems.
- Strong proficiency in supporting Production Cloud environments (AWS and VMware) and managing cloud software deployments, with a continuous integration using Gitlab, GitHub, Jenkins, Bamboo, SVN and Tomcat.
- Experience executing the CI Jenkinsbuildjob for both Android and iOS application builds. Using GIT (Stash) tool as the source code repositories for all projects and Artifactory for all builds (ipa/apk)releaserepository.
- Experienced with version control tools like GIT and SVN and integratedbuildprocess with Jenkins.
- Hands - on experience using Maven as build tool for building of deployable artifacts from source code.
- Experience with installing, configuring and monitoring using Nagios, dynaTrace, Zenoss, Splunk, and Munin.
- Good knowledge and experience in using Elasticsearch, kibana, CloudWatch, Nagios, Splunk, and Grafana for logging and monitoring.
- Experienced in Amazon EC2 setting up instances, VPCs, and security groups.
- Experience in Installing and configuring Apache/Tomcat/Java/MySQL.
- Production support of Apache and JBoss including installation, configuration, management and troubleshooting.
- Expertise in user administration, backup, startup, shutdown scripts, crontabs, and automation using Perl, shell scripting (bash, ksh) for RedHat Linux and AIX systems.
- Experience in Install and configuration of Web hosting administrationHTTP, FTP, SSH, and RSH
- Excellent analytical, problem solving, communication and interpersonal skills.
TECHNICAL SKILLS
Operating Systems: Red Hat 4.x/5.x/6.x, Solaris 8/9/10/11, CentOS 4/5, SUSE Linux 10/11, AIX 6.1, Ubuntu, Fedora, Windows, VMware ESXi 4.0/5.0/5.1/5.5.
Hardware: Sun Fire 6800/4900/3800, Sun Fire 490/280, v880/v890, Sun Enterprise Servers E6500, E4500, E3500, Work Group Servers E450, E420, E250, E220, Intel Servers, IBM/HP/DELL Blade Servers
Networking Services: TCP/IP, NIS, NFS, FTP, DNS, DHCP, SSH, LAN, WAN, SMTP
Disk Management: Logical Volume Manager, Sun Volume Manager, Veritas Volume Manager 4x, 5.x, 6.0.
Programming Languages: Bash Shell scripting, Python, Perl, C, C++, HTML, Matlab, C#, Java.
Third Party Software: VERITAS Volume Manager, VERITAS Cluster Service, EMC Storage, RAID Technologies, SSL, Chef, Ansible, Puppet, GitHub, Jenkins, Tomcat.
Backup Solutions: Veritas Netbackup 4.5, 5.0, 6.5
File Systems: Ext, Ext2, Ext3, Ext4, ZFS, UFS, VxFS.
Virtualization: VMware VSphere, Vcenter Server, Xen Server.
Web/Application Servers: WebSphere, WebLogic, Apache Tomcat, JBOSS, Amazon Web Server.
PROFESSIONAL EXPERIENCE
Confidential, Stamford, Connecticut
Service Reliability Engineer
Responsibilities:
- Front line technical service reliability operators accountable for handling critical customer issues coming in via support ticket in ServiceNow and support phone line.
- Analyzed the issues and resolving the incidents in ServiceNow, implement a workaround to maintain business continuity and ensuring minimum downtime of critical corporate applications.
- Supported, deployed, maintained, and trouble-shoot various .Net applications and web services hosted on various versions IIS web servers, SQL Server and Azure in production environment.
- Involved in end-to-end troubleshooting for any application slowness/service interruptions by looking at IIS Logs, EventViewer, and Debug.
- Maintaining, creating, enhancing and revising runbooks for Operational support for new onboarding teammate to quickly distinguish and troubleshoot the critical issues.
- Troubleshoot and monitor 3rd party applications hosted on AWS, Rackspace, and Azure Cloud platforms.
- Managed Azure Active directory and fixed permission issues and setup sync between on-premise Active Directory and Azure AD
- Managed SSL Certificatesfor all lower and production environment and worked on renewing and setting up certificates with application in IIS
- Acted as point of contact to Infra teams like OS, Network, Onsite/Offshore support teams etc. for incident resolution
- Managed load balancing using Local Traffic Manager (LTM) and enabled and disabled server nodes as per troubleshooting or testing requirements.
- Scheduled pre and post Activebatch WLA patching job and ensure applications are up and running as expected via smoke testing.
- Built scripts using Maven build tools in Jenkins to deploy J2EE applications to Application servers from one environment to other environments.
- Used Jenkins, Build forge for Continuous Integration and deployment into Web sphere Application Server.
- Used Agile/scrum Environment and use Jenkins, GitHub for Continuous Integration and Deployment.
- Responsible of automating the manual job via ActiveBatch invoking bat script in servers where corporate application is hosted.
- Deployed code updates into test and production environments and work to roll environments forward.
- Run Queries on databases; Mongo dB, and Oracle to check daily data injection and refresh for business use cases.
- Maintained GitLab repositories for developers and promote topic branch workflow in Azure DevOps platform.
- Setup continuous integration and formal builds using Jenkins and Artifactory repo
- Used Ansible and Ansible Tower as Configuration management tool, to automate repetitive tasks, quickly deploys critical applications, and proactively manages change.
- Implemented Ansible to manage all existing servers and automate the build/configuration of new servers.
- Developed an Ansible playbook for a cluster, implementing automated deployment and configuration.
- Configured Ansible to manage AWS environments and automate the build process for core AMIs used by all application deployments including Auto-scaling, and Cloud formation scripts.
- Configured alerts for corporate applications in SiteScope for application uptime and system level monitoring in Spectrum.
- Monitored System and Application Logs of servers using Splunk to detect Production issues.
- Monitored Splunk dashboards, Splunk Alerts and configured scheduled alerts based on the internal customer requirement.
- Prepare/update disaster recovery plan of applications and send MIM (Major Incident Management) in case of critical business impacting major incident to the stakeholders.
- Providedroot cause analysis for all P1 issuesassigned and worked with development teams to fix issues.
Environment:Windows, Splunk, ServiceNow, Jira, AWS, Azure, Jenkins, Ansible, SiteScope, Spectrum, Activebatch, iManage, Exari, Git, AppDynamics, Java, HTML, Shell, Python, CSS, PowerShell, MongoDB, SQL.
Confidential, Sunnyvale, California
Site Reliability Engineer/DevOps Engineer
Responsibilities:
- Lead for to the assigned properties for RHEL Kernel upgrade and re-imaging frontend web server across the entire property. Additionally, assisted Production environment in configuring metrics during monitoring system migration in GitHub.
- Performing, Delivering and managing of Applications and Services and participate oncall rotation.
- Wrote the code in Python using Fabric library to run in parallel for Kernel Upgrade with latest packages.
- Installation and configuration of DHCP server, DNS server and PXE Boot RHEL 6.x and Ubuntu 16.x.
- Ensure health of production systems, investigate anomalous behavior and triage outages, shepherd code changes from development to production, develop and enhance automation and monitoring tools configuration.
- Participating in 24x7x365 on-call rotation, troubleshooting issue and serve as escalation point to Production engineer for unexpected operational issues.
- Successfully transitioned production deployment and on-call/triage responsibilities.
- Creating generic runbook for Operational team ramp-up and quick distinguish of critical issue for escalation to SRE for further triage.
- Implemented standards for incident tracking in ServiceNow, documentation in Confluence repo, ticketing in Jira, and monitor server/cluster using Splunk.
- Managed user, SUDO privileges and firewall configuration according to the environment policy on RHEL 6.x.
- Installing and updating packages using central server repositories.
- Lead for the BCP (Business Continuity Planning). Successfully shifted traffic between data centers to mitigate impact during user-impacting incidents.
- Implementing standards for incident tracking in OpsGenie and ServiceNow, documentation in GitHub repo, ticketing in Jira, and post-mortems for P1 and P2 incidents.
- Created and maintained Subversion repositories, branches, and tags and administered SVN.
- Coordinated and assisted developers with establishing and applying appropriate branching, labelling/naming conventions using SVN and Git.
- Administered Artifactory, and responsible for backing-up/upgrading to latest Artifactory versions and control/grant access to the authorized users.
- Used Jenkins, Build forge for Continuous Integration and deployment into Web sphere Application Server.
- Setup continuous integration and formal builds using Jenkins and Artifactory repo
- Used Ansible and Ansible Tower as Configuration management tool, to automate repetitive tasks, quickly deploys critical applications, and proactively manages change.
- Implemented Ansible to manage all existing servers and automate the build/configuration of new servers.
- Developed an Ansible playbook for a cluster, implementing automated deployment and configuration.
- Configured Ansible to manage AWS environments and automate the build process for core AMIs used by all application deployments including Auto-scaling, and Cloud formation scripts.
- Created deployment workflows in Jenkins which includes prerelease,releaseand Postrelease steps.
- Coordinating with Splunk Admin and system owners to onboard applications in Splunk and ensure/test logging capabilities are functional.
- Monitored System and Application Logs of servers using Splunk to detect Production issues.
Environment:RHEL 6.x, Splunk, RedHat Cluster, AWS, Jenkins, GitHub, VERITAS Cluster Server 4.1, NFS, DNS, SAN/NAS, AWS, GitHub, Shell, Bash, ServiceNow, OpsGenie
Confidential, Massachusetts
System Administrator
Responsibilities:
- Installed, configured and administered UNIX servers which include selection of relevant tools and hardware to support the installation, upgrade, and update of Red Hat, Solaris, Ubuntu and Fedora operating system.
- Worked with RPM sources (patching and recompiling).
- Installed, reinstalled, upgraded, and removed packages using RPM and YUM.
- Obtained information about RPM packages including version, status, dependencies, integrity, and signatures.
- Build RHEL servers by using puppet tool.
- ConfiguredNIS, NFS, DNS, DHCP, FTP, FSTP, Telnet, and RAIDlevels.
- Developed automation and deployment utilities using Ruby, Bash, C++, Java, and Python and use Puppet, Jenkins, and GitHub for continuous Integration and Tomcat to deploy Java Servlets and JSPs.
- Implemented and maintained internal systems key to DevOps operations such as database servers, continuous integration, and QA/Test servers.
- Monitored and troubleshoot backups and schedule Cron jobs.
- Used Agile/scrum Environment and use Jenkins, GitHub for Continuous Integration and Deployment.
- Wrote Templates for AWS infrastructure as a code using Terraform to build staging and production environments.
- Build, test, and deploy scalable, highly available OpenStack Private Cloud environments on both Customer premises and Rackspace Cloud environments.
- Developed/Provisioned bare metal servers on OpenStack cloud using PXE boot through Terraform automation tool.
- Developing scripts for build, deployment, maintenance and related tasks using Jenkins, Docker, Maven, Python and Bash.
- Environment provisioning solutions using Docker, Vagrant, Red Hat Satellite.
- Implemented VMware ESX server to provide multiple virtual hardware platforms.
- Troubleshoot ESX host issues with VMware. Used packet capture analysis to diagnose application issues.
- Created and managed Development and Continuous Integration Environments using VMWare ESX, automated through Jenkins using PXE boot, Perl and the VMWare CLI.
- Used Jenkins and pipelines to drive all Microservices builds out to the Docker registry and then deployed to Kubernetes, Created Pods and managed using Kubernetes
- Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GitHub, Docker, on GCP.
- Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
- Implemented standards for incident tracking in ServiceNow, documentation in GitHub repo, ticketing in Jira, and monitor server/cluster using Splunk.
Environment:RHEL 5/6.x, Solaris 9,10&11, HPUX, SUSE 10, 11, VERITAS Volume Manager 3.x/ 4.x, VERITAS Storage Foundation 5, Redhat Cluster, OpenStack, VERITAS Cluster Server 4.1, Tripwire, NFS, DNS, SAN/NAS.
