Sr. Site Reliability Engineer Resume
TN
SUMMARY
- Over all Around 7 years of extensive experience in Automating, configuring, and deploying instances on cloud environments and Data centers. Experience in the areas of DevOps, CI/CD Pipeline, Build and release management, AWS/Azure and Linux/Windows Administration.
- Years of experience in administrating of IAAS & PAAS Virtual Machines and Web/Worker roles on Microsoft Azure Classic and Resource Manager and troubleshooting issues on Azure VMs.
- Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups and having good noledge on Azure Fabric, Micro services in Azure.
- Embedded Experience in Authoring Azure Resource Manger templates, Experience in deploying Azure using Agile methodology, Configuring the Azure Load Balancer to Load balance incoming traffic to virtual machines.
- Expert in various Azure services like Compute (Web Roles, Worker Roles), Caching, Azure SQL, NoSQL, Storage, and Network services, Azure Active Directory, API Management, Scheduling, Azure Autoscaling, PowerShell Automation, Azure Virtual Machines, Azure search, Azure DNS, Azure VPN Gateway.
- Experience in migrate an On - premises Instances or Azure Classic Instances to Azure ARM Subscription with Azure Site Recovery.
- Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups and having good noledge on Azure Fabric, Micro services in Azure.
- Wrote Ansible Playbooks with Python SSH as the Wrapper to Manage Configurations of AWS nodes and Tested Playbooks on AWS instances using Python.
- Expertise in AWS Serverless Application Repository to find an application and configured the application by setting environment variables, parameter values and tan deployed the application to Aws account and managed it from the Aws management Console.
- Managing security groups on AWS, focusing on high-availability, fault tolerance, and auto-scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS Code Pipeline.
- Experience in Monitoring server performance with tools like Nagios, Splunk, Dynatrace, Data dog, New Relic and resolved network related issues with manual commands and built Splunk Cluster environment with High Availability resources.
- Solid understanding and use of various iOS frameworks such as: Core Location, Quartz Core, Security, Data Protection, Urban Airship, AF Networking, Social Framework, Event Kit
- Using Chef, deployed and configured Elasticsearch, Logstash and Kibana (ELK) for log analytics, full text search, application monitoring in integration with CloudWatch.
- Installed and configured Kubernetes for Orchestration of Docker Images and Cluster Container management on AWS using Kubernetes Operations (KOPS).
- Used Kubernetes to deploy, scale, load balance and manage Docker containers in multiple namespaces. And have good experience with Kubernetes Rancher for deployments
- Experience in managing the clusters using Kubernetes and worked on creating pods, replication controllers, services, deployments, labels, health checks.
- Experienced in production-grade Kubernetes dat allow enterprises to reliably deploy and run containerized workloads across private and public clouds. Managed Kubernetes charts using Helm.
- Created reproducible builds of the Kubernetes applications, managed Kubernetes manifest files and Managed releases of Helm packages.
- Experienced in Installing, Configured and management in Ansible Centralized Server and creating the playbooks to support various middleware application servers, and involved in configuring the Ansible tower as a configuration management tool to automate repetitive tasks.
- Installed and administered Artifactory repository to deploy the artifacts generated by Maven and to store the dependent jars which are used during the build.
- Worked in an Agile development team to deliver an end to end continuous integration, continuous delivery product in an open source environment using tools like Chef and Jenkins.
- Extensively worked on Jenkins for Continuous Integration and Continuous Deployment (CI/CD) methodologies for End to End automation for all build and deployments.
- Extensive noledge in writing Shell, Python, YAML/JSON and bash scripting and throughout the entire DevOps Life cycle.
- Identified root cause of application slow down and optimize code, network, or resource sizes.
- Used in troubleshooting user-facing incidents faster by intelligently group high volume application errors into a small number of issues with Error Tracking.
- Monitored performance and history of infrastructure with tools such as CloudWatch and Data dog.
- Worked on Linux Administration like RAID levels, Grub, Disk management, patch management, Networking, Scripting, Kickstart, LVM, CRON jobs, Performance monitoring, troubleshooting.
- Set up and managed ELK (Elastic Search, Log Stash & Kibana) Stack to collect, search and analyze log files across servers, log monitoring and created geo-mapping visualizations using Kibana in integration with AWS Cloud watch and Lambda.
- Experience in working with the ELK (Elastic Search, Logstash, Kibana) stack & Splunk to analyze & visualize large volumes of log data obtained from servers.
- Worked on creating the modules driven AWS Infrastructure with Terraform. Created Infrastructure Git repositories for Terraform to launch the stacks.
- Hands on experience in writing Terraform API modules to manage infrastructure, for automatic creation of RDS instances, VPCs, Autoscaling groups, Load balancers, SQS, S3 buckets.
TECHNICAL SKILLS
Cloud Services: Amazon Web Services (AWS), Azure, OpenStack, Pivotal Cloud Foundry (PCF), Google Cloud Platform (GCP)
Virtualization: VMware ESX/ESXI, Windows Hyper-V, vSphere 5.x, Datacenter Virtualization, Power VM, Virtual Box, Citrix Xen, KVM.
Operating Systems: Red Hat Linux 4/5/6/7, CentOS, Fedora, SUSE LINUX, UNIX, Windows servers 2003, 2008, 2008 R2, 2012, 2012R2, Windows 2000/2003/XP/vista/7/8/10, Ubuntu 12/13/14, Sun Solaris 8/9/10/11, HPUX 10.x/11.x
Automation/configuration Tools: Chef, Puppet, Docker, Vagrant, Ansible, Jenkins, Hudson, Bamboo, Kickstart, Jumpstart, Terraform, Kubernetes, ANT, Maven.
Web Servers: Apache Tomcat, JBOSS 4.x/5.x, Web Logic (8/9/10), WebSphere ¬-Apache 1.3.x, Apache 2.0.x, and ngnix, IIS
Database Technologies: DB2, SQL Server, MySQL, RDS, NoSQL- MongoDB, Cassandra DB, DynamoDB
Scripting languages: Ruby, Python, Perl, HTML5, PHP, Bash/shell Scripting, PowerShell Scripting YAML, JSON.
Networking/Protocol: FTP/SFTP, SMTP, TCP/IP, HTTP/HTTPS, NDS, DHCP, NFS, Cisco Routers, Juniper Routers
Version Control Tools: GIT, Bitbucket, SVN (Subversion), CVS
Monitoring Tools: Nagios, Splunk, Elasticsearch, Logstash and Kibana (ELK), CloudWatch, Cloud trial, Dynatrace
Volume Manager: VERITAS Volume manager, LVM
Application Servers: Web Logic Application Server 9.x, 10.x, Apache Tomcat 5.x/7.x, Red Hat JBOSS 4.22.GA, WebSphere 6.x/7.x/8.x
PROFESSIONAL EXPERIENCE
Confidential, TN
Sr. Site Reliability Engineer
Responsibilities:
- Setup Datadog monitoring across different servers and AWS services.
- Created Datadog dashboards for various applications and monitored real-time and historical metrics.
- Created system alerts using various Datadog tools and alerted application teams based on the escalation matrix.
- Automate Datadog Integrations through Ansible Scripts for QA, Regression and Prod environments.
- Integrated Cloud checker, Datadog, Splunk Dashboard with AWS accounts.
- Monitored performance and history of infrastructure with tools such as CloudWatch and Datadog.
- Visualized load times, frontend errors, and dependencies for both browser and mobile apps.
- Correlated user experience and application usage with business metrics.
- Identified root cause of application slow down and optimize code, network, or resource sizes.
- Used in troubleshooting user-facing incidents faster by intelligently group high volume application errors into a small number of issues with Error Tracking.
- Slice-and-dice to look for errors effecting a specific environment, browser, OS, and more.
- Send custom metrics to keep track of application KPIs like average customer basket size or request latency.
- Augment your user sessions with attributes like code version, env, and more and attach custom user actions to events like button clicks, taps, and swipes.
- Pivot from user experience data to request traces and logs for complete context when triaging issues and Unify monitoring in a single platform for frontend and backend development teams.
- Used Datadog RUM which complements synthetics to provide comprehensive insights into the applications, as experienced by your users.
- Responsible for implementing monitoring solutions in Terraform.
- Create and maintain highly scalable and fault tolerant multi-tier AWS and Azure environments spanning across multiple availability zones using Terraform and CloudFormation.
- Write terraform scripts from scratch for building Dev, Staging, Prod and DR environments.
- Implemented Microservices in load balanced, highly available, fault tolerant Kubernetes infrastructure.
- Automate provisioning and repetitive tasks using Terraform and Docker container, Service Orchestration.
- Used Jenkins pipelines to drive all microservices builds out to the Docker registry and tan deployed to Kubernetes, Created Pods and managed using Kubernetes.
Environment: AWS, Kubernetes, Datadog, Terraform, Docker, Splunk, Jenkins, Nexus, GIT, Veritas, SVN, Ant, Bash/shell scripts, JIRA.
Confidential, TX
Sr. Cloud /DevOps Engineer
Responsibilities:
- Experience in Automating, Configuring, and deploying instances on AWS and Azure Cloud environments.
- Implemented Azure SQL Server for storing the data related to the recruitment and extensively worked on queries and stored procedures.
- Working noledge on Azure Cloud IaaS and PaaS Services, Azure SQL, Azure storage, and Azure Services.
- Design roles and groups using Azure Identity and Access Management (IAM).
- Cloud watch monitoring of EC2 instances on CPU utilization, disk space and custom metrics and provided alerts to developers and Nagios Monitoring for on-site hosts and servers.
- Expertise in configuring the monitoring and alerting tools according to the requirement like AWS CloudWatch, AWS Cloud Trail, Dynatrace, Nagios, Splunk Enterprise for the VPN connections.
- Excellent noledge of Azure compute services, Azure Web apps, Data Factory & Blob Storage, Azure Networking, and Identity & Access Management, Azure AD, Multi-Factor Autantications.
- Managed Azure Infrastructure Azure Web Roles, Worker Roles, SQL Azure, Azure Storage, Azure AD Licenses, Office365. Virtual Machine Backup and Recover from a Recovery Services Vault using Azure PowerShell and Portal.
- Manage Identity Access management of Azure Subscriptions, Azure AD, Azure AD Application Proxy, Azure AD Connect, Azure AD Pass through Autantication.
- Created reproducible builds of the Kubernetes applications, managed Kubernetes manifest files and Managed releases of Helm packages.
- Developed microservices onboarding tools leveraging Python and Jenkins for easy creation and maintenance of build jobs and Kubernetes deploy and services.
- Delivered environment mapping in AWS dat included Active Directory, LDAP, AWS IAM roles for AWS API Gateway platform. Executed Micro Services with AWS ECS Docker/Kubernetes for code building and packaging.
- Developed backup and recovery engine for VM backup/recovery using VMware vSphere APIs, Go-Lang programming language and RabbitMQ Message bus (communication interface).
- Working on Open stack nova, setup monitoring on Kubernetes, New Relic for application performance and sumo logic for log monitoring performance.
- Used Jenkins and pipelines to drive all micro services builds out to the Docker-registry and tan deployed to Kubernetes, Created Pods and managed using Kubernetes.
- Deploying and maintaining production environment using AWS EC2 instances and elastic container services with Docker.
- Used Ansible to Setup/teardown of ELK stack (Elasticsearch, Log stash, Kibana) and troubleshoot the build issues with ELK and work towards the solution.
- Used Ansible Tower, which provides an easy-to-use dashboard and role-based access control, so dat it's easier to allow individual teams access to use Ansible for their deployments.
- Use Ansible, Chef, Jenkins, Git, for implementing Continuous Integration from scratch and optimize the Continuous Integration using Jenkins and troubleshoot the deployment build issues using the triggered logs.
- Used Ansible to manage Web applications, Environments configuration Files, Users, Mount points and Packages.
- Automate Datadog Dashboards with the stack through Terraform Scripts.
- Integrate Datadog in Jenkins pipeline and Automate the Dashboard and Alerts.
- Migrate infrastructure monitoring from Zabbix to Datadog.
- Setup Continuous Delivery pipeline using Ansible playbooks. This primarily consists of a Jenkins to run packages and various supporting software components such as Maven.
- Created artifact documents through the source code and internal deployment in Nexus repository. Implemented Disaster recovery project on AWS using various DevOps automation for CI/CD.
- Experienced in both framework and CloudFormation to automate AWS environment creation along with the ability to deployment on AWS, using build scripts (AWS CLI) and automate solutions using Shell and Python.
- Proficiency in scripting for automation, & monitoring using Shell Bash, PowerShell, PHP, Java, Python, YAML, Ruby & Perl scripts.
- Managing Puppet with GIT, Distributing Puppet Manifests.
- Integration of GIT with Jenkins to automate the code check-out process with the halp of Jenkins DSL Plugin.
- Worked with AWS Cloud Formation Templates, terraform along with Ansible to render templates and Murano with Heat Orchestration templates in OpenStack Environment.
- Written templates for AWS Infrastructure as a code using Terraform to build staging and production environments.
Environment: AWS, Azure, Azure ARM Chef, Puppet, Jenkins, Maven, ANT, Ruby, Shell, Python, WebLogic Server 11g, Load Balancers, WLST, Apache Tomcat 7.x, Docker, Virtualization, Configured plug-ins for Apache HTTP server 2.4, Nginx, LDAP, JDK1.7, XML, SVN, GitHub, Cloud watch, Splunk, Nagios, Terraform, Kubernetes, Ansible, Nexus.
Confidential, GA
Sr. DevOps Engineer
Responsibilities:
- Responsible for implementing AWS solutions and setting up the cloud infrastructure with different services, like EC2, S3, VPC, ELB, AMI, EBS, RDS, DynamoDB, Lambda, Auto Scaling, Route53, Subnets, NACL's, Cloud Front, Cloud Formation, Cloud Watch Cloud Trail, SQS and SNS.
- Managed the AWS Lambda Functions configuration information based on requirements and built lambda functions using Node.js, Python and Java.
- Worked on deploying AWS WAF (Web Application Firewall) as part of the CDN solution for the ALB (Application Load Balancer) dat fronts the web server on EC2 instances.
- Used AWS Beanstalk for deploying and scaling web applications and services developed with Java, PHP, Node.js, Python, Ruby and Docker on familiar servers like Apache.
- Created data pipeline for different events of mobile applications, to filter and load consumer response data from Urban-Airship in AWS S3 bucket into Hive external tablesin HDFS location.
- Created several pods using Master and Minion architecture of Kubernetes and developed microservice on boarding tools leveraging Python allowing for easy creation and maintenance of build jobs and Kubernetes deploy and services.
- Worked on Docker, OpenShift to manage microservices for development and point team player on OpenShift for creating new Projects, Services for load balancing and adding them to Routes to be accessible from outside.
- Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed, configured Kubernetes. Deploying cluster on AWS with Jenkins, Docker pipeline implementation.
- Installed and configured a private Docker Registry, authored Docker files to run apps in containerized environments and used Kubernetes to deploy scale, load balance and manage Docker containers with multiple namespace ids.
- Wrote Ansible Playbooks with Python SSH as the Wrapper to Manage Configurations of AWS Nodes and Test Playbooks on AWS instances using Python. Run Ansible Scripts to provision Development servers.
- Used Ansible to Setup/teardown of ELK stack (Elasticsearch, Log stash, Kibana) and troubleshoot the build issues with ELK and work towards the solution. Used Ansible playbooks for provisioning instances on Open stack.
- Configured Ansible control machine, Ansible Playbooks (written in YAML language), roles and modules.
- Installed and Configured the Nexus repository manager for sharing the artifacts within the company.
- Installed & configured Jenkins master dat served 34 different slaves supporting 8 different applications with various release life cycles and multiple CI pipelines setup on the project branches.
- Created multiple Perl, Python, PowerShell and Unix Shell Scripts for various applications level tasks.
- Deployed and configured Elasticsearch, Log stash and Kibana (ELK) for log analytics. Configure ELK stack in conjunction with AWS and using Log stash to output data to AWS S3.
- Experience in using Docker and setting up ELK with Docker and Docker-Compose. Actively involved in deployments on Docker using Kubernetes.
Environment: AWS, Kubernetes, Docker, Ansible, Python, Chef, Nagios, Splunk, Kickstart, Hudson, Jenkins, Nexus, GIT, Veritas, SVN, Ant, Bash/shell scripts, JIRA.
Confidential, MN
DEVOPS ENGINEER
Responsibilities:
- Involved in Architect, build and maintain Highly Available secure multi-zone AWS cloud infrastructure utilizing Chef with AWS CloudFormation and Jenkins for continuous integration
- Maintained automated environment using Chef Recipes & Cookbooks within AWS and involved in Knife and Chef Bootstrap process, converted production support scripts to chef recipes and AWS server provisioning using chef recipes.
- Experience in Monitoring server performance with tools like Nagios, Splunk, Dynatrace, Datadog, New Relic and resolved network related issues with manual commands and built Splunk Cluster environment with High Availability resources.
- Automated Datadog Dashboards with the stack through Terraform Scripts and assisted internal users of Splunk in designing and maintaining production quality dashboards.
- Implemented a Continuous Delivery pipeline with Docker, Jenkins and GitHub. Responsible for installation & configuration of Jenkins to support various Java builds and Jenkins plugins to automate continuous builds and publishing Docker Images to the Nexus Repository.
- Implemented Docker-Maven plugin and Maven POM to build Docker Images for all microservices and later used Docker file to build the Docker Images from the java jar files.
- Managed Docker networking subsystem by using User-defined bridge networks, Host networks, Overlay networks, Macvlan networks and third-party network plugins.
- Implemented Chef recipes for automated Orchestration of Cassandra clusters and worked on upgrading Cassandra from old 2.x to 3.0 version.
- Written Chef Cookbooks and recipes to Provision several pre-production environments consisting of Cassandra database installations and several proprietary middleware installations.
- Used Chef to automate workflow and ensure all changes are tested and approved with the same rigor and speed and to ensure changes are only deployed once properly approved.
- Used Docker to configure Postgres Docker Image and Nexus Proxy Repository with SSL configuration for secure connections.
- Managed the Maven Repository using Nexus tool and used the same to share the snapshots and releases of internal projects.
- Implemented new Docker Container creation process for each GitHub branch gets started on Jenkins as Continuous Integration server.
- Build the Maven artifacts using Jenkins and Deployed into Amazon Cloud Environment by adding Monitoring Metrics to Cloud Watch and the respective Alarms.
- Worked with the Groovy Scripts in Jenkins to execute jobs for a Continuous Integration Pipeline with Ansible, Vagrant and Docker.
- Worked on creating Lambda functions to have the serverless provisioned using Boto3 module of python.
- Automated various day-to-day administration task by developing Bash, Ruby, JSON, Perl, PowerShell, and Python Scripts.
- Used MAVEN build tool for the development of build artifacts on the source code (.WAR) and Installed and configured Nexus repository manager for storing the artifacts within the company using the Continuous Integration tool like Jenkins.
- Experienced in Branching, Merging, Tagging and maintaining the version across the environments using SCM tools like GIT and Subversion (SVN) on Linux platforms.
Environment: AWS, Kubernetes, Docker, Ansible, Python, Chef, Nagios, Splunk, Kickstart, Hudson, Jenkins, Nexus, GIT, Veritas, SVN, Ant, Bash/shell scripts, JIRA.
Confidential, MA
BUILD AND RELEASE ENGINEER
Responsibilities:
- Provisioning and maintain AWS resources using Cloud Formation, Terraform, Ansible and Python Boto3 Modules.
- Wrote Playbooks in YAML to automate the entire deployment process as well as infrastructure admin tasks and Used Ansible for Continuous Delivery, Managed CI/CD process and delivered all application in rpms.
- Implemented Continuous Delivery (CD) framework using Jenkins, Maven, Docker, Bitbucket, Nexus in Linux environment for 70+ micro services in each environmental setup.
- Created additional Docker Slave Nodes for Jenkins using custom Docker Images and Worked on all major components of Docker like Docker Daemon, Hub, Images, Registry, Swarm etc.
- Developed Build and Deployment Scripts using build tools MAVEN in JENKINS to migrate from one environment to another environment.
- Integrated GIT with Jenkins using the GitHub plugin to automate the process of source code check-out by providing the URL and credentials of the GIT repository.
- Maintained Bitbucket Repositories which includes Jenkins and JIRA Integration, creating new repositories, enabling GIT to ignore, branching, merging, creating pull requests and the access control strategies from Bitbucket.
- Implementing, maintaining, and enhancing build processes using Maven, Ant, Apache Ivy, Gradle, Groovy, MS build, NANT and Nexus
- Automated Weekly releases with ANT/Maven scripting for Compiling Java Code, Debugging and Placing Builds into Maven Repository.
- Implemented a Git mirror for SVN repository, which enables users to use both SVN and Git.
- Created and updated Puppet manifests and modules, files, and packages stored in the GIT repository.
- Handling admin tasks in Linux OS such as server restart, application installation setting up monitoring dashboards for app server.
Environment: RedHat Enterprise Linux, Bamboo, Subversion, Perforce, Nagios, ANT, Python, Puppet, CentOS, Ubuntu, Kickstart, VMware, Nagios, TCP/IP, NFS, DNS, SNMP, and DHCP.
Confidential
System Administrator
Responsibilities:
- A custom build of Windows 2003 & Windows 2008 servers which includes adding users, SAN, network configuration, installing application related packages, managing services.
- Involved in installation, configuration, upgrade & administration of IBM Series & Power5/Powerer6 servers on various levels of AIX operating system utilizing environment.
- Responsible for configuring real-time backup of web servers.
- Worked with remote system administration using tools like SSH, Telnet, & Rlogin.
- Migrated the entire Application from JBoss to Tomcat environment.
- Configuring the NFS servers, setting up servers in a network environment & configuring FTP, NTP, NIS servers, clients for various departments & clients.
- Involved with Kernel tuning, writing Shell scripts for system maintenance & file management.
- Managing HP-UX, Compaq & Linux workstations & servers.
- Responsible for the Database, Network operation with 80 servers.
Environment: Microsoft Windows 95, 98, 2000, XP, 2003, Linux, DNS, DHCP, TCP/IP, RIP, FTP, TFTP, Terminal Services, SNMP, SMTP, NFS, Oracle, Db2, JBoss.
Confidential
Linux Administrator
Responsibilities:
- Linux Administrator in a large team responsible for maintaining Linux operating systems such as RHEL, CentOS, Ubuntu, and SUSE 10/11.
- Updated and automated Release reports for Change Management. Created SVN configuration record for builds using derived objects generated during build audit process.
- Building & configuring RedHat Linux systems over the network, implementing automated tasks through Crontab, resolving tickets according to the priority basis.
- Expertise in UNIX shell scripting and python scripts used to automate day to day administrative tasks. Involved in writing Python, Perl, and Shell scripts for compilation and deployment process.
- Worked in an Agile / Scrum development team to deliver an end to end continuous integration and continuous deployment in SDLC.
- Responsible for configuring and maintaining Squid server in Linux. Deployed Java applications into Apache Tomcat Application Servers. Used Test driven approach for developing the application and Implemented the unit tests using Python Unit test framework.
- Provided 24/7 on-call support on Linux Production Servers. Responsible for maintaining security on RHEL.
Environment: Linux, Python, Ruby, RHEL, Nginx, Microsoft Windows, TCP/IP, Java, Oracle, Agile, WebLogic, MySQL, Subversion, Apache, JBoss, Shell Scripting, Bash Scripting, Python, PowerShell, Active Directory.