AI Triage (deep dive)

For the user-facing walkthrough, see Maintenance → AI Triage. This page covers the prompt itself, how we evaluate it, and the heuristic fallback.

Prompt structure

ClaudeTriageAssistant sends Claude a two-part prompt:

System: role, output format, categories + urgency levels with definitions, priority rules ("safety first"), format constraints (strict JSON, no prose before or after), and — since prompt-v3 — guidance for treating attached photos as primary evidence.
User: content blocks. When the tenant attached photos via the public report page, we prepend up to 3 base64 image blocks before the text block; otherwise just the raw text. Anthropic returns the same JSON-only response either way.

Key decisions in the system prompt:

Categories are closed-set. Claude can't invent a new one; must pick from the 11 supported.
Urgency maps to SLA. We explain EMERGENCY = 4h SLA, URGENT = 24h so Claude understands the real-world consequence of its choice.
Safety wins ties. If input is ambiguous between APPLIANCE and EMERGENCY_SAFETY (e.g. gas heater yellow flame), we instruct escalation to EMERGENCY_SAFETY.
Actions, not just classification. Prompt v2 added suggestedActions[] so the PM has next-steps, not just a label.
Confidence calibration. We ask Claude to self-score. In evals we find Haiku is slightly over-confident (0.9 when we'd mark 0.8); we haven't re-calibrated yet but threshold the UI at 0.7 conservatively.

Prompt version history

prompt-v1 (2026-03) — initial: category + urgency + confidence + one-sentence reasoning.
prompt-v2 (2026-04) — added suggestedActions[] array. Triage went from "classifier" to "triage advisor".
prompt-v3 (2026-05) — vision-aware. The user message can carry up to 3 image content blocks (base64 JPEG/PNG/WEBP, ≤ 8 MB each) the tenant uploaded on the public report page. Prompt teaches Claude to trust the visual evidence over a short text body and to lower confidence when the photo contradicts what the tenant wrote. HEIC/HEIF uploads are stored but excluded from the API call (Anthropic rejects them at the source-format check).
prompt-v4 (planned, see AI Roadmap) — adds sentimentScore + escalationFlag for tenants writing in ALL CAPS about a repeat issue.

Heuristic fallback

HeuristicTriageAssistant — cheap, deterministic, zero-cost. Keyword rules fire top-down; first match wins. Abbreviated:

gas|carbon monoxide|smoke|fire            → EMERGENCY_SAFETY, EMERGENCY
burst|flood|water coming through ceiling  → PLUMBING, URGENT
no hot water|cold shower                  → PLUMBING, URGENT
power out|no electricity|sparks           → ELECTRICAL, URGENT
oven|fridge|washing machine|dishwasher    → APPLIANCE, NORMAL
lock|key|broken door|break.?in            → SECURITY, URGENT
pest|rat|mice|cockroach                   → PEST, NORMAL
…
(default)                                 → OTHER, NORMAL, confidence 0.4

Confidence for the fallback is capped at 0.5 so the UI always shows the "Human triage required" badge. We want a PM reading the request when the heuristic is doing the classification.

Evaluation

We maintain a fixture set (ClaudeTriageAssistantTest) of ~40 labelled tenant reports, including:

Clear single-category (broken tap → PLUMBING URGENT)
Ambiguous multi-signal (gas heater making noise + tenant has headache → EMERGENCY_SAFETY EMERGENCY)
Long rambling prose with buried safety signal
Short one-liner
Non-English phrasing (tenant-written broken English)

We run the fixture against every prompt change and look at:

Category accuracy (target ≥90%)
Urgency correctness within ±1 level (target 100%)
No EMERGENCY_SAFETY downgrade in presence of safety keywords (hard gate)
Confidence calibration (target Brier score improvement)

Anthropic API integration

AnthropicClient is a direct java.net.http.HttpClient wrapper. No SDK — deliberate, keeps the dependency surface small and the behaviour debuggable.

Key config:

Model: claude-haiku-4-5
Max output tokens: 600
Timeout: 8 seconds
API key: ANTHROPIC_API_KEY env var
Circuit breaker: 3 consecutive failures → 5-minute open state

Prompt structure​

Prompt version history​

Heuristic fallback​

Evaluation​

Anthropic API integration​

See also​

Prompt structure

Prompt version history

Heuristic fallback

Evaluation

Anthropic API integration

See also