What is OpenSRE?

OpenSRE is an open-source AI platform that investigates production incidents the way an experienced SRE would — but faster and around the clock.

The Problem

When a production incident hits, engineers scramble to figure out what went wrong. They check dashboards, grep through logs, trace requests, and piece together a timeline. This process is slow, stressful, and depends heavily on tribal knowledge.

How OpenSRE Helps

OpenSRE automates this investigation process. When an alert fires, OpenSRE's AI agents:

Gather context from your monitoring tools — Prometheus, Grafana, Datadog, Elastic, and more
Investigate systematically using 46 built-in investigation skills
Learn from past incidents through episodic memory, getting better over time
Map service dependencies via a knowledge graph powered by Neo4j
Produce a detailed report with root cause analysis, timeline, and remediation steps

Key Features

Episodic Memory

Unlike stateless AI tools, OpenSRE remembers past investigations. When a similar incident occurs, it recalls what worked before — the same way a senior engineer builds intuition over years of on-call experience.

Knowledge Graph

OpenSRE maintains a live graph of your service topology. It knows which services depend on which, what changed recently, and how failures propagate through your system.

46 Investigation Skills

From checking Kubernetes pod status to analyzing Prometheus metrics to reading Sentry error traces — OpenSRE has a growing library of investigation skills that it selects based on the incident context.

Open Source

OpenSRE is fully open-source under the Apache 2.0 license. Self-host it in your own infrastructure. Your data stays with you.

Get started with OpenSRE →