Job Seekers, Please send resumes to resumes@hireitpeople.com
Mandatory Skills
- SRE, Observability Tools and Production Support
Desired Skills
- Knowledge on Java, Python, Go, Node etc
Any Certification (Mandatory)
- Certified on one or more observability tools like Splunk. AppDynamics, Grafana, Dynatrace etc.
Detailed JD (Roles and Responsibilities)
Skills
- SRE Mindset in Production support: Proactive issue identification using observability tools. Skills in using different monitoring & observability tools to track system performance
- Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.
- Communication: Excellent communicator who could interact with Director/Sr. Director and above.
Technical expertise
- Splunk (including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes
- Knowledge of VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix
- Knowledge of Containerization, Docker, Kubernetes, AWS, PCF, GCP
- ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)
- APM, NMON, Wireshark usage and analysis
- Experience in UEM and synthetic monitoring tools
Responsibilities
- Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR
- Coordinate all activities required to lead incident triage in compliance with SLAs and OLAs. Corelating inputs from various dashboards & tools to drive resolution.
- Flexibility to work in 24 X 7 environment