Advisor: Give Any Model a Lifeline to a Smarter One
Kenny Rogers ·
Add openrouter:advisor to your tools array and your model can ask a stronger model for help mid-generation. When the executor hits a hard decision, gets stuck, or wants a sanity check before finishing, it calls the advisor with a prompt. The advisor thinks, returns guidance as the tool result, and the executor keeps going with better information.
Try it in the chatroom or read the docs for the full API reference.
{
"model": "openai/gpt-4o-mini",
"messages": [{ "role": "user", "content": "Design a rate limiter for a distributed API gateway." }],
"tools": [
{
"type": "openrouter:advisor",
"parameters": { "model": "anthropic/claude-fable-5" }
}
]
}
67x price gap, selective consultation
Claude Fable 5 costs $10 per million input tokens. GPT-4o Mini costs $0.15 per million. That’s a 67x spread.
Most requests don’t need frontier-level reasoning. A mid-tier model handles the bulk of a workload without issue. But the 10-20% that involves architectural decisions, ambiguous edge cases, or multi-step reasoning chains is where cheaper models stumble.
The advisor tool covers that gap selectively. Your fast model runs the show. When it hits something genuinely hard, it calls for help. You pay frontier prices only for the moments that need frontier thinking.
In an agentic coding session with 50 tool calls, maybe 2-3 are advisor consultations. The rest run at mini prices. You’ve sanded down your per-session cost while keeping the quality ceiling high.
Server-side execution, one tool call
The advisor runs server-side during generation. Your model calls it like any other tool: pass a prompt describing what it needs help with, get back the advisor’s text as the tool result. The model then writes the final answer itself, informed by the advice. The advisor is a consultant, not a ghostwriter.
Four things worth knowing:
-
Any model can be the advisor. Pin it in the tool config with
parameters.model, or let the executor pick per-call. Use~anthropic/claude-fable-latestto always resolve to the newest Fable. -
The advisor gets its own tools. Give it
openrouter:web_searchand it’ll ground its advice in fresh sources before responding. It runs as a sub-agent with its own tool loop, then returns just the final guidance. -
Recursion is blocked. The advisor can’t call itself. A depth header and self-reference check prevent unbounded nesting, and consultations are capped per request to bound cost.
-
The advisor remembers. Replay the conversation transcript in a follow-up request (with the advisor tool calls and results included) and each advisor reconstructs its prior consultations, so a follow-up question builds on what the advisor already said. Memory is per advisor (your security reviewer and your architect each keep their own thread) and works across Chat Completions, Responses, and Anthropic Messages. Full details.
Named advisors
For complex workflows, you can configure a roster of specialists. Add one openrouter:advisor entry per advisor, each with its own name, model, instructions, and tool set:
{
"tools": [
{
"type": "openrouter:advisor",
"parameters": {
"name": "security-reviewer",
"model": "anthropic/claude-fable-5",
"instructions": "You are a security engineer. Find vulnerabilities."
}
},
{
"type": "openrouter:advisor",
"parameters": {
"name": "architect",
"model": "openai/gpt-5.5",
"instructions": "You are a systems architect. Prioritize simplicity and scalability."
}
}
]
}
The executor sees a distinct tool for each advisor and calls whichever fits the task with just a prompt. An auth flow review routes to Claude Fable with the security persona; architecture questions go to GPT-5.5. Names can use letters, digits, spaces, underscores, and dashes (“Lead Architect” works), and must be unique across entries. One entry can omit name to act as the default advisor.
Advice can also stream. Set "stream": true on an advisor entry and you get the advice incrementally as the advisor writes it. In the Responses API that means response.output_text.delta events while the advice is in flight; the completed output item still carries the full text, so consumers that ignore deltas see no difference. (Chat Completions ignores the flag, and Messages-API streaming is a fast-follow.)
Billing
Advisor tokens bill at the advisor model’s rates, separate from the executor. If your executor is GPT-4o Mini ($0.15/$0.60 per M tokens) and the advisor is Claude Fable 5 ($10/$50 per M tokens), each model’s tokens bill at their own price. Both show up on your activity page.
Get started
One line in your tools array:
{ "type": "openrouter:advisor", "parameters": { "model": "anthropic/claude-fable-5" } }
The model decides when to use it. Most requests won’t trigger a consultation; the ones that do will be better for it. Read the full docs for parameters, named advisors, sub-agent tools, and more.