Open Source · Apache 2.0

Your incidents

deserve a better

investigator.first responder.debugger.troubleshooter.

An AI SRE that autonomously investigates production incidents, remembers every past investigation, and maps your entire service topology.

Thumbnail for See OpenSRE in action

See OpenSRE in action

Watch a full incident investigation in 60 seconds

46
Investigation skills
<5m
Mean time to detect
100%
Open source
Integrates with
KubernetesKubernetes
PrometheusPrometheus
GrafanaGrafana
DatadogDatadog
ElasticElastic
SplunkSplunk
JaegerJaeger
New RelicNew Relic
SentrySentry
PagerDutyPagerDuty
SlackSlack
GitHubGitHub
ConfluenceConfluence
KubernetesKubernetes
PrometheusPrometheus
GrafanaGrafana
DatadogDatadog
ElasticElastic
SplunkSplunk
JaegerJaeger
New RelicNew Relic
SentrySentry
PagerDutyPagerDuty
SlackSlack
GitHubGitHub
ConfluenceConfluence
The problem

SRE teams are stuck in a loop of manual, repetitive investigation.

01

Alert fatigue

Drowning in alerts at 3 AM, manually checking dashboards one by one. Your team spends 45 minutes triaging before investigation even begins.

02

Context switching

Jumping between Grafana, kubectl, Splunk, PagerDuty. Every tool is a tab, every tab is a context switch, every switch is time lost.

03

Tribal knowledge

When your senior SRE leaves, their investigation playbooks walk out the door. Each incident feels like starting from scratch.

Meet OpenSRE

OpenSRE investigates incidents, surfaces root causes, and learns from every investigation.

Alert
Slack
Web UI
OpenSRE
Memory
Skills
Graph
Root cause
Report
Topology
Kubernetes
AWS
GCP
Docker
Production Systems
Prometheus
Grafana
Datadog
Elastic
Observability
Confluence
Notion
PagerDuty
Knowledge
GitHub
GitLab
Slack
Teams
Code
How it works

From alert to root cause. In minutes, not hours.

Step 1/Alert fires

Alert fires

PagerDuty, Slack, or any webhook triggers the investigation. No human needed to start.

Step 2/Agent activates

Agent activates

The planner formulates hypotheses and dispatches specialized sub-agents in parallel.

Step 3/Deep investigation

Deep investigation

Kubernetes, metrics, logs, and traces are queried simultaneously. Memory recalls similar past incidents.

Step 4/Root cause report

Root cause report

Findings are correlated, root cause identified, and a structured report is delivered.

Capabilities

Built for real incidents.

Not a chatbot. A production-grade investigation engine with memory, knowledge, and autonomous reasoning.

Memory

Episodic memory

Every investigation is stored with multi-factor similarity matching. The agent recalls context from past incidents before investigating new ones — building institutional knowledge that never leaves.

Similarity-based retrievalAuto-generated strategiesCross-incident learning
Graph

Knowledge graph

Neo4j-powered service topology maps your entire infrastructure. Dependency traversal and blast radius analysis give the agent a systemic understanding — not just symptom checking.

Service dependency mappingBlast radius analysisTopology-aware investigation
Skills

46 investigation skills

From Kubernetes debugging to Prometheus queries to runbook execution. Skills load progressively — the agent only loads what it needs, keeping context focused.

Progressive knowledge loadingk8s, metrics, logs, tracingCustom skill authoring
Agents

Multi-agent architecture

A planner agent coordinates specialized sub-agents that investigate in parallel. Like a senior SRE delegating to a team, but faster.

Parallel investigationPlanner + sub-agentsReal-time SSE streaming
Apache 2.0 Licensed

100%opensource.

Novendorlock-in.

Self-host on your infrastructure. Extend with custom skills. Your data never leaves your network. Built by SREs, for SREs.