LLM Agent
Overview
The Copilot agent (src/qdash/api/lib/copilot_agent.py) uses the OpenAI SDK directly (not Pydantic AI) to interact with LLMs. It supports two API paths:
- OpenAI Responses API (default) -- Used for OpenAI models (e.g., gpt-4.1). Supports tool calling, structured JSON output, and multimodal input.
- Chat Completions API (fallback) -- Used for Ollama models that don't support the Responses API. No tool calling support.
Tool Definitions
The agent has access to 17 tools defined in AGENT_TOOLS:
| Tool | Description |
|---|---|
get_qubit_params | Get current calibrated parameters for a qubit |
get_latest_task_result | Get the latest result for a specific calibration task |
get_task_history | Get recent historical results for a calibration task |
get_parameter_timeseries | Get time series data for a single qubit |
execute_python_analysis | Execute Python code in a sandboxed environment (data store auto-injected) |
get_chip_summary | Get summary of all qubits on a chip with statistics (stored tool) |
get_coupling_params | Get calibrated parameters for coupling resonators |
get_execution_history | Get recent execution history for a chip |
compare_qubits | Compare parameters across multiple qubits |
get_chip_topology | Get chip topology information |
search_task_results | Search task result history with flexible filters |
get_calibration_notes | Get calibration notes for a chip |
get_parameter_lineage | Get version history of a calibration parameter |
get_provenance_lineage_graph | Get provenance lineage graph for a parameter |
generate_chip_heatmap | Generate chip-wide heatmap for a qubit metric |
get_chip_parameter_timeseries | Batch timeseries for all qubits on a chip (stored tool) |
list_available_parameters | List available output parameter names |
Tool executors are built by CopilotDataService.build_tool_executors(), mapping each tool name to a Python callable that queries MongoDB or invokes the sandbox. UI display labels are defined in ai_labels.py.
Tool Call Loop
The agent implements a multi-round tool-calling loop (max MAX_TOOL_ROUNDS = 10 iterations):
Build system prompt + input
│
▼
Apply tool executor wrappers:
_wrap_rate_limited_executors (throttle per-qubit timeseries)
_wrap_tool_executors (data store + chart interception)
│
▼
Call OpenAI Responses API
(with tools if tool_executors provided)
│
▼
┌─────────────────────┐
│ Response has │
│ function_call items? │──No──▶ Extract output_text ──▶ Return
└────────┬────────────┘
│ Yes
▼
For each function_call:
1. Fire on_tool_call callback (for SSE progress)
2. Look up executor by name
3. Parse arguments from JSON
4. Execute tool (through wrappers)
5. Append function_call_output to input
│
▼
Rebuild input (preserve ALL output items
including reasoning items to avoid 400 errors)
│
▼
Fire on_status("thinking") callback
│
▼
Call Responses API again ──▶ Loop back to check
(up to MAX_TOOL_ROUNDS)
│
▼
Inject collected charts as blocks in responseKey implementation detail: when feeding tool results back, all model output items (reasoning, function_call, message) must be preserved in the input. Omitting reasoning items causes a 400 error from the OpenAI API.
Data Store Pattern
Large-data tools (get_chip_parameter_timeseries, get_chip_summary) use a data store to avoid sending full datasets to the LLM:
- Tool executes and returns full data
_wrap_tool_executorsstores the result indata_store[key]- LLM receives only a compact summary (
_build_llm_summary) with schema info and adata_key - When the LLM calls
execute_python_analysis, the sandbox receivesdata_storeas thedatavariable - LLM-generated code accesses full data via
data["t1"],data["chip_summary"], etc.
This eliminates token double-consumption (LLM no longer echoes back large datasets as context_data) while preserving full data precision for sandbox analysis. See Tool Result Compression for details.
Response Format
The agent uses two response schemas:
Blocks Schema (primary, used for OpenAI)
{
"blocks": [
{"type": "text", "content": "Markdown text here", "chart": null},
{"type": "chart", "content": null, "chart": {"data": [...], "layout": {...}}}
],
"assessment": "good" | "warning" | "bad" | null
}blocksis an ordered array of content blocks, each eithertextorchartchartblocks contain Plotly.js specs withdata(traces) andlayoutassessmentprovides an overall quality judgment (nullable for informational responses)- Schema is passed with
strict: Falseto allow flexible chart objects
Legacy Schema (used for Ollama fallback)
{
"summary": "One-line summary",
"assessment": "good" | "warning" | "bad",
"explanation": "Detailed analysis",
"potential_issues": ["issue1", "issue2"],
"recommendations": ["action1", "action2"]
}Legacy responses are automatically converted to blocks format via _legacy_to_blocks().
System Prompt Construction
The system prompt is assembled from multiple parts depending on the mode:
Analysis Mode (_build_system_prompt)
SYSTEM_PROMPT_BASE # Role definition + capabilities
+ Language instruction # Response/thinking language from config
+ Task knowledge prompt # From TaskKnowledge.to_prompt()
+ Scoring thresholds # Per-metric good/excellent/bad ranges
+ Qubit context # Current parameters for target qubit
+ Experiment results # Metric values, R², output/run parameters
+ Historical results # Recent runs for trend context
+ Neighbor qubit params # Adjacent qubit data (if configured)
+ Coupling params # Coupling data (if configured)
+ CHART_SYSTEM_PROMPT # Response format instructions with examplesChat Mode (_build_chat_system_prompt)
CHAT_SYSTEM_PROMPT # Role + tool usage instructions
+ Language instruction # Response/thinking language
+ Scoring thresholds # Per-metric ranges
+ Chip/qubit context # Current chip_id, optional qid + params
+ CHART_SYSTEM_PROMPT # Response format instructionsThe chat system prompt includes detailed instructions for tool usage, including how to normalize qubit IDs and which parameter names to use for get_parameter_timeseries.
Callbacks
The agent supports two async callback hooks for real-time progress reporting:
on_tool_call(name: str, args: dict) -> None
Fired when the model emits a function call, before execution. Used by the SSE streaming layer to send tool progress events to the frontend.
on_status(status: str) -> None
Fired when the agent enters a new processing phase (e.g., "thinking" when calling the LLM). Used to update the status indicator in the UI.
Both callbacks are optional (None by default) and are only used by the streaming endpoints.