Site Reliability Engineer Resume

SUMMARY

Total 5+ years. Experienced in Production support / DevOps / Engineering with service improvement records in Infrastructure reliability process with more than 5 years of IT experience. eComm website experience is a must.
Confidential is one of the world's leading information technology companies.
Through its Global Network Delivery Model™, Innovation Network, and Solution Accelerators, Confidential focuses on helping global organizations address their business challenges effectively.
A part of the Confidential, India's largest industrial conglomerate, Confidential has over 111,000 of the world's best trained IT consultants in 50 countries.
The company generated consolidated revenues of US $5.7 billion for fiscal year ended 31 March, 2008 and is listed on the Confidential in India

PROFESSIONAL EXPERIENCE

Site Reliability Engineer

Confidential

PROFESSIONAL EXPERIENCE

Strong computer science education
Maniacal about site availability
Excellent troubleshooting skills with the ability to dive into all aspects of the stack to identify and fix problems
Large scale, multi - data center, cloud hosted web administration
Linux Systems Administration
Cloud Expertise - Google preferred. AWS / Pivotal should also be fine.
Cloud IaaS automation using scripting languages- Python and Unix shell
Web server technologies- HTTP, Nginx, Apache, Varnish, Tomcat
Full stack website reliability engineering
Infrastructure configuration management- GIT
Exposure on monitoring tools - Splunk, Appd, Dynatrace
Basic to moderate application programming experience- Java
Application Performance Monitoring tools- Gomez, Dynatrace, Splunk, StackDriver
CI/CD tool chain- Git, Jenkins, Google CDM, Docker, PCF
NoSQL Database Administration- C*
Web Load Testing- jmeter, Load Runner
Team player; enjoys working with developers
Calm under pressure

TECHNICAL SKILLS

Site Reliability for eCommerce website
Engage with Dev teams. Pair when appropriate
Establish Service SLAs
Cloud Infrastructure for apps/services
24x7x365 monitoring and incident response
Root Cause Analysis
Production deployment
Cloud Platform Development (at least 25% of time)
Monitor the health of a service through automation.
Investigate automated alerts for services.
Restore production service.
Determine the "blameless" root cause of service incidents.
Enforce service SLAs.
Reduction in percentage of manual tasks
Good written/Oral Communication, Presentation, Interactive skills with team across geography.
Flexible to work either independently or with team.
Must have sense of urgency and display high availability to coordinate support activities.
Onshore / Offshore team management, capacity planning