Streaming Responses
MindRoom streams agent responses to Matrix by progressively editing a single message. Instead of waiting for the full response, users see text appear in real time as the model generates it.
How It Works
- Agent starts generating a response.
- MindRoom sends an initial message with the first chunk of text plus an in-progress marker (
⋯). - As more text arrives, MindRoom edits the same message with the accumulated content.
- When the response is complete, the final edit removes the
⋯marker.
User sends message
│
▼
┌──────────────┐ presence check
│ Agent starts │ ──────────────────▶ Is user online?
│ generating │ │
└──────┬───────┘ ┌─────┴─────┐
│ Yes No
▼ │ │
Stream chunks ▼ ▼
via edits Streaming Single message
with ⋯ marker (progressive (sent when
│ edits) complete)
▼
Final edit
(⋯ removed)
Configuration
Streaming is enabled by default.
Disable it globally in config.yaml:
enable_streaming is a global-only setting under defaults and cannot be overridden per agent.
Tune the streaming edit cadence globally under defaults.streaming:
defaults:
enable_streaming: true
streaming:
update_interval: 5.0 # Default: 5.0 steady-state seconds between edits
min_update_interval: 0.5 # Default: 0.5 fast-start seconds between early edits
interval_ramp_seconds: 15.0 # Default: 15.0; set 0 to disable ramping
These timing settings are global-only. Agents inherit them from defaults and cannot override them individually.
Presence-Based Streaming
Even when streaming is enabled, MindRoom only streams to users who are currently online.
This is checked via should_use_streaming() which queries the Matrix presence API.
| Presence State | Streaming Used? |
|---|---|
online |
Yes |
unavailable |
Yes |
offline |
No (single message sent when complete) |
If the presence check fails, MindRoom defaults to non-streaming (safer, fewer API calls). When no requester user ID is available, MindRoom defaults to streaming.
In-Progress Marker
While a response is being generated, the message ends with ⋯ followed by zero to two dots that cycle as edits arrive.
This gives users a visual indicator that the agent is still working.
Hello! I can help you with that ⋯
Hello! I can help you with that ⋯.
Hello! I can help you with that ⋯..
Hello! I can help you with that ⋯
If no text has arrived yet, a Thinking... placeholder is shown with the marker.
The marker is removed on the final edit.
Throttling
MindRoom throttles edits to avoid overwhelming the Matrix homeserver:
- Time-based:
defaults.streaming.update_intervalsets the steady-state interval between edits (default: 5 seconds). - Character-based: An edit is also triggered when enough new characters have accumulated. The character threshold ramps from 48 characters (fast start) to 240 characters (steady-state) over the ramp-up period.
- Ramp-up:
defaults.streaming.min_update_intervalanddefaults.streaming.interval_ramp_secondscontrol how quickly the time-based interval ramps from a fast start to the steady-state value. By default it ramps from 0.5s to 5s over 15 seconds. Settinginterval_ramp_seconds: 0disables the ramp and uses the steady-state interval immediately. - Shared ramp window: The same ramp window also controls the built-in character threshold ramp from 48 characters (fast start) to 240 characters (steady-state).
- Minimum interval: A hard floor (0.35s) prevents edit spam even when character thresholds are met.
Tool Calls During Streaming
When an agent calls tools during a streamed response, MindRoom shows inline markers in the message text:
The number in brackets ([N]) is a 1-indexed counter per message.
Each marker maps to io.mindroom.tool_trace.events[N-1] in the message metadata.
When show_tool_calls is disabled for an entity, tool markers are omitted from the message text and tool-trace metadata is not attached.
The agent still shows typing activity during hidden tool calls.
Cancellation and Errors
Users can cancel an in-progress response by reacting with 🛑 on the message being generated (see Stop Button). When cancelled, the streamed message is finalized with:
If an error occurs during streaming, the message is finalized with:
Large Streamed Messages
If a streamed response exceeds the Matrix event size limit (55KB for new messages, 27KB for edits), the large message system automatically uploads a JSON sidecar and includes a preview in the event body. See Matrix Integration — Large Messages for details.
Visibility Toggles
Two global defaults control what users see during streaming:
defaults:
show_tool_calls: true # Default: true — show inline tool markers and tool-trace metadata
show_stop_button: true # Default: true — add 🛑 reaction for cancellation
When show_tool_calls is false, inline tool markers (🔧 tool_name [N]) are omitted from the message text and io.mindroom.tool_trace metadata is not attached.
The agent still shows typing activity during hidden tool calls.
show_tool_calls can also be overridden per agent in the agent config.
When show_stop_button is false, the 🛑 reaction is not added to in-progress messages.
Streaming itself still works — only the cancellation affordance is removed.
show_stop_button is a global-only setting under defaults.
enable_streaming is also global-only and cannot be overridden per agent.
Room Mode
When an agent operates in thread_mode: room (see Thread Mode Resolution), streaming skips all thread relations and sends plain room messages.
This is used for bridges and mobile clients that don't support Matrix threads.
Replacement Streaming
MindRoom also supports a ReplacementStreamingResponse variant where each chunk replaces the entire message content instead of appending to it.
This is used for structured live rendering where the full document is rebuilt on each tick.