OpenSRE is an open-source AI SRE platform that investigates production incidents autonomously. When an alert fires, OpenSRE's AI agents gather context from your observability stack, reason about root causes, and produce a detailed incident report — the way an experienced SRE would, but faster and around the clock.
OpenSRE is built for teams that are tired of manual, repetitive incident investigation:
When an alert fires, OpenSRE's planner agent breaks the investigation into parallel subtasks. Multiple investigation subagents execute simultaneously, each querying different data sources — Prometheus metrics, Kubernetes pod status, application logs, distributed traces. A synthesizer agent combines the findings and a writeup agent produces a structured report.
OpenSRE remembers past investigations. After every incident, it extracts key metadata — root cause, affected services, alert type, severity — and stores it in its episodic memory. When a similar incident occurs, OpenSRE retrieves relevant past episodes and uses them to guide the new investigation. This is how a senior SRE builds intuition over years of on-call experience, replicated in software.
OpenSRE maintains a live graph of your service topology in Neo4j. It knows which services depend on which, tracks recent deployments, and can perform blast radius analysis — given a failing component, which services are affected? This context is automatically provided to investigation agents.
OpenSRE comes with 46 built-in investigation skills: checking Kubernetes pod health, querying Prometheus for anomalies, analyzing Grafana dashboards, reading Datadog traces, scanning Sentry errors, and more. Skills are loaded on-demand based on the incident context.
Works with: Prometheus, Grafana, Datadog, Elastic/ELK, Splunk, Jaeger, New Relic, Sentry, PagerDuty, Slack, GitHub, Confluence, and Kubernetes.
OpenSRE uses a graph-based agent orchestration system built on LangGraph. Alerts enter via Slack (through the Slack bot) or directly through the web console. The sre-agent processes investigations and streams results via Server-Sent Events.
Slack → slack-bot → sre-agent (LangGraph)
Web UI ────────────→ │
┌───┴───┐
│ │ │
Memory Skills KG
For a deep dive into the architecture, see Architecture.
OpenSRE is released under the Apache 2.0 license. Self-host it in your own infrastructure. Your data, your control.