Site Reliability Engineer Resume
4.00/5 (Submit Your Rating)
SUMMARY
- Total 5+ years. Experienced in Production support / DevOps / Engineering with service improvement records in Infrastructure reliability process with more than 5 years of IT experience. eComm website experience is a must.
- Confidential is one of the world's leading information technology companies.
- Through its Global Network Delivery Model™, Innovation Network, and Solution Accelerators, Confidential focuses on helping global organizations address their business challenges effectively.
- A part of the Confidential, India's largest industrial conglomerate, Confidential has over 111,000 of the world's best trained IT consultants in 50 countries.
- The company generated consolidated revenues of US $5.7 billion for fiscal year ended 31 March, 2008 and is listed on the Confidential in India
PROFESSIONAL EXPERIENCE
Site Reliability Engineer
Confidential
PROFESSIONAL EXPERIENCE
- Strong computer science education
- Maniacal about site availability
- Excellent troubleshooting skills with the ability to dive into all aspects of the stack to identify and fix problems
- Large scale, multi - data center, cloud hosted web administration
- Linux Systems Administration
- Cloud Expertise - Google preferred. AWS / Pivotal should also be fine.
- Cloud IaaS automation using scripting languages- Python and Unix shell
- Web server technologies- HTTP, Nginx, Apache, Varnish, Tomcat
- Full stack website reliability engineering
- Infrastructure configuration management- GIT
- Exposure on monitoring tools - Splunk, Appd, Dynatrace
- Basic to moderate application programming experience- Java
- Application Performance Monitoring tools- Gomez, Dynatrace, Splunk, StackDriver
- CI/CD tool chain- Git, Jenkins, Google CDM, Docker, PCF
- NoSQL Database Administration- C*
- Web Load Testing- jmeter, Load Runner
- Team player; enjoys working with developers
- Calm under pressure
TECHNICAL SKILLS
- Site Reliability for eCommerce website
- Engage with Dev teams. Pair when appropriate
- Establish Service SLAs
- Cloud Infrastructure for apps/services
- 24x7x365 monitoring and incident response
- Root Cause Analysis
- Production deployment
- Cloud Platform Development (at least 25% of time)
- Monitor the health of a service through automation.
- Investigate automated alerts for services.
- Restore production service.
- Determine the "blameless" root cause of service incidents.
- Enforce service SLAs.
- Reduction in percentage of manual tasks
- Good written/Oral Communication, Presentation, Interactive skills with team across geography.
- Flexible to work either independently or with team.
- Must have sense of urgency and display high availability to coordinate support activities.
- Onshore / Offshore team management, capacity planning