BlogQuality assurance

A degraded-mode playbook for calendar, CRM, and telephony outages

Design a useful but honest voice workflow for partial outages, unknown tool outcomes, queued work, recovery, and staff communication.

VoxsAgents Research TeamJune 20, 2026

Quality assurance

AI call quality scorecard review process

A scorecard review process for greeting quality, clarity, outcome verification, compliance language, and follow-up consistency.

Read

AI Agent Test Lab: a regression-testing guide for production voice agents

Build repeatable response tests for booking, routing, safety boundaries, tool failures, and prompt changes before live callers find a regression.

Read

How to operate AI call quality scorecards without hiding weak outcomes

Use searchable scorecards, consistent dimensions, flags, sampling, and reviewer calibration to turn call data into corrective action.

Read

An outage should change what the agent is allowed to promise

Voice automation depends on several systems that can fail independently. Telephony may still carry a call while the calendar is unavailable. The calendar may work while the CRM rejects writes. A webhook can be delayed even though the upstream action succeeded. Treating every dependency failure as a generic retry problem produces duplicate work and statements that cannot be supported by evidence.

A degraded mode is a smaller operating policy activated when a required capability is unhealthy. It specifies which answers remain safe, which actions must stop, what information may be queued, how the caller is informed, and who owns recovery. The objective is not to imitate normal operation. It is to preserve a useful path without hiding uncertainty from callers or staff.

Original VoxsAgents failure-path review

We separated dependencies by the business claim they support. A knowledge source supports an informational answer; a calendar read supports an availability statement; a calendar write supports a booking confirmation; a transfer status supports a connected-handoff claim. This evidence map makes it possible to disable one claim without taking the entire phone line offline.

The review also distinguishes known failure from unknown outcome. A validation error is a known failure because the provider rejected the request. A network timeout after submission is unknown because the action might have completed. These states need different recovery rules, caller language, and retry permissions. Combining them under one red error banner is operationally unsafe.

Build a capability matrix before an incident

For each workflow, list the required dependency, the evidence returned on success, the safe fallback, and the maximum time queued work may wait. Basic business hours might remain available from a reviewed local configuration. Live availability cannot be claimed when the calendar read fails. A callback request may be queued if contact details can be stored securely and staff can see the queue after recovery.

Stop booking confirmations when calendar writes are unhealthy, while allowing clearly labelled callback requests if approved.

Stop account-specific answers when CRM verification is unavailable; do not substitute caller-provided claims for verified records.

Use a transfer fallback when provider status shows no answer or failure, and never label a ringing destination as connected.

Give queued actions an owner, creation time, expiry rule, deduplication key, and visible recovery status.

Publish one approved caller explanation for each degraded capability so wording remains consistent across agents.

Recovery is a workflow, not a switch

When a dependency returns, queued work should not be replayed blindly. Some callers may have contacted staff through another channel, chosen a different time, or opted out. The recovery worker must revalidate eligibility, reconcile unknown provider outcomes, and apply idempotency before performing an action. Expired requests should become staff-review tasks rather than silent automated changes.

Staff need a concise incident view showing affected calls, claims that were disabled, queued work, uncertain outcomes, and corrections. This allows the team to contact the right callers instead of reviewing every conversation. The same incident identifier should connect health events, tool attempts, queue records, call summaries, and administrator changes.

Test the policy with controlled failure injection

A status page alone cannot prove that degraded behavior is safe. In a test environment, force calendar reads to fail, delay writes until they time out, reject CRM authorization, deliver duplicate webhooks, and make the transfer destination ring without answering. Review both the spoken response and the stored outcome because either can misrepresent what occurred.

Useful measures include calls affected by capability, false-confirmation count, unknown outcomes, queue age, reconciliation success, duplicate prevention, staff correction rate, and time to restore normal policy. A release should be blocked if the agent promises an action after its evidence source has been disabled. Degraded mode succeeds when it is limited, inspectable, and honest.

Research note and primary sources

This is original VoxsAgents workflow research based on system-state modelling, failure-path analysis, implementation review, and test-design work. It is an operational analysis, not a verified customer outcome claim. The official primary references below inform the controls and provider behavior discussed in this article.

Validate these recommendations against the organization's real tools, permissions, contracts, jurisdictions, and approved operating procedures before deployment.

A degraded-mode playbook for calendar, CRM, and telephony outages

Related articles

An outage should change what the agent is allowed to promise

Original VoxsAgents failure-path review

Build a capability matrix before an incident

Recovery is a workflow, not a switch

Test the policy with controlled failure injection

Limitations and responsible use

Research note and primary sources