We provide IT Staff Augmentation Services!

Site Reliability Engineer Resume


  • Total 5+ years. Experienced in Production support / DevOps / Engineering with service improvement records in Infrastructure reliability process with more than 5 years of IT experience. eComm website experience is a must.
  • Confidential is one of the world's leading information technology companies.
  • Through its Global Network Delivery Model™, Innovation Network, and Solution Accelerators, Confidential focuses on helping global organizations address their business challenges effectively.
  • A part of the Confidential, India's largest industrial conglomerate, Confidential has over 111,000 of the world's best trained IT consultants in 50 countries.
  • The company generated consolidated revenues of US $5.7 billion for fiscal year ended 31 March, 2008 and is listed on the Confidential in India


Site Reliability Engineer



  • Strong computer science education
  • Maniacal about site availability
  • Excellent troubleshooting skills with the ability to dive into all aspects of the stack to identify and fix problems
  • Large scale, multi - data center, cloud hosted web administration
  • Linux Systems Administration
  • Cloud Expertise - Google preferred. AWS / Pivotal should also be fine.
  • Cloud IaaS automation using scripting languages- Python and Unix shell
  • Web server technologies- HTTP, Nginx, Apache, Varnish, Tomcat
  • Full stack website reliability engineering
  • Infrastructure configuration management- GIT
  • Exposure on monitoring tools - Splunk, Appd, Dynatrace
  • Basic to moderate application programming experience- Java
  • Application Performance Monitoring tools- Gomez, Dynatrace, Splunk, StackDriver
  • CI/CD tool chain- Git, Jenkins, Google CDM, Docker, PCF
  • NoSQL Database Administration- C*
  • Web Load Testing- jmeter, Load Runner
  • Team player; enjoys working with developers
  • Calm under pressure


  • Site Reliability for eCommerce website
  • Engage with Dev teams. Pair when appropriate
  • Establish Service SLAs
  • Cloud Infrastructure for apps/services
  • 24x7x365 monitoring and incident response
  • Root Cause Analysis
  • Production deployment
  • Cloud Platform Development (at least 25% of time)
  • Monitor the health of a service through automation.
  • Investigate automated alerts for services.
  • Restore production service.
  • Determine the "blameless" root cause of service incidents.
  • Enforce service SLAs.
  • Reduction in percentage of manual tasks
  • Good written/Oral Communication, Presentation, Interactive skills with team across geography.
  • Flexible to work either independently or with team.
  • Must have sense of urgency and display high availability to coordinate support activities.
  • Onshore / Offshore team management, capacity planning

Hire Now