BlogAgent design

Protecting voice agents from caller instructions that conflict with business rules

A practical control model for caller-supplied instructions, tool permissions, data boundaries, escalation, and adversarial QA.

VoxsAgents Research TeamJune 19, 2026

Agent design

Multi-location AI call routing for US service businesses

Resolve branches, phone numbers, calendars, business hours, and ownership before an AI agent offers availability or transfers a caller.

Read

Governance for website and PDF knowledge auto-sync

Detect source changes, preserve provenance, review risky updates, and prevent stale or hostile content from becoming agent instructions.

Read

Website knowledge auto-sync for voice agents

Design a safe website sync that detects changes without copying scripts, navigation, private hosts, or unsupported claims into agent answers.

Read

A caller is an input source, not an administrator

A caller can naturally say things such as ignore the normal policy, mark this as approved, reveal another customer's appointment, or transfer me to an internal number. The wording may be playful, urgent, or persuasive, but the security issue is the same: untrusted conversation content is attempting to change the agent's operating instructions or tool authority.

VoxsAgents should preserve a strict order of authority. Platform safety controls and organization-approved rules govern the workflow. Verified business data supplies facts. Tool results supply action evidence. Caller statements provide intent and details that still require validation. The model may interpret language, but it should not be able to promote caller text into a new permission or policy.

Original VoxsAgents threat analysis

We examined attacks by intended effect rather than by a list of suspicious phrases. The main effects were unauthorized disclosure, unauthorized action, policy bypass, routing abuse, and audit manipulation. This approach is more durable because the same effect can be requested politely, indirectly, in another language, or through content read from an external knowledge source.

The analysis found that tool boundaries provide stronger protection than a warning inside a prompt. If a booking tool requires an eligible service, scoped organization identifier, validated fields, and a server-side permission check, persuasive language cannot create an arbitrary booking. If a transfer tool accepts any telephone number from generated text, the prompt becomes the only barrier and the impact of a failure is much larger.

Control the action surface

Every tool should expose the smallest operation the workflow requires. The agent can choose from approved locations or transfer destinations rather than generating identifiers. Server-side code must derive organization scope from the authenticated execution context, not from a caller-provided value. Sensitive account changes should require the business's approved verification and may still require human review.

Separate conversational text from system instructions and never concatenate caller content into an administrator policy field.

Validate tool parameters against organization-owned services, calendars, destinations, field formats, and workflow state.

Return only the minimum data required for the current call; do not give the model broad customer lists to filter itself.

Require explicit evidence before status-changing actions and log the tool result independently of the generated summary.

Escalate repeated boundary-testing or unsupported requests without arguing with the caller or exposing internal controls.

Knowledge content is also untrusted input

A retrieved document, website excerpt, CRM note, or uploaded FAQ can contain text that looks like an instruction. The application should treat that material as reference content, not authority. Retrieval results need source labels, organization scoping, content review, and output boundaries. An instruction found inside a customer note must not override the approved transfer or disclosure policy.

Summaries need similar protection. A caller can ask the agent to write that identity was verified or that a manager approved a refund. Structured summary fields should be derived from actual verification and tool events where possible. Free-text notes may capture the caller's claim, but they should label it as a claim rather than turning it into a completed action.

Adversarial testing should assert outcomes

Create tests that request cross-organization information, arbitrary transfers, unapproved discounts, fake verification, hidden prompt disclosure, and changes to audit records. Vary tone, language, interruptions, and indirect requests. The assertion should inspect tool calls, returned data, stored status, and logs—not only whether the spoken answer sounded like a refusal.

Track blocked unauthorized actions, sensitive-data exposure, unexpected tool parameters, escalation accuracy, false positives, and reviewer corrections. A perfect refusal script is not enough if the tool ran before the refusal. The release criterion is that protected state and data remain protected, with a clear path for legitimate callers to reach staff when automation cannot safely continue.

Research note and primary sources

This is original VoxsAgents workflow research based on system-state modelling, failure-path analysis, implementation review, and test-design work. It is an operational analysis, not a verified customer outcome claim. The official primary references below inform the controls and provider behavior discussed in this article.

Validate these recommendations against the organization's real tools, permissions, contracts, jurisdictions, and approved operating procedures before deployment.

Protecting voice agents from caller instructions that conflict with business rules

Related articles

A caller is an input source, not an administrator

Original VoxsAgents threat analysis

Control the action surface

Knowledge content is also untrusted input

Adversarial testing should assert outcomes

Limitations and responsible use

Research note and primary sources