Most mid-tier firms have run scenario testing. The question that matters in 2026 is not whether the exercise took place, but whether the evidence produced would satisfy a PRA supervisor asking to see it.

Those are different questions, and the gap between them is the most common finding in FourthLine's diagnostic work across insurance, banking, and investment management. Firms invest meaningful time in annual exercises, then document the outcome in a way that does not meet the regulatory evidencing standard. The exercise was real. The evidence record is not sufficient. The regulatory exposure is the same either way.

This article sets out precisely what PRA SS1/21 requires from scenario testing evidence, describes the difference between documentation that describes an exercise and evidence that demonstrates a resilience position, and gives practitioners a clear picture of what a compliant, supervisory-grade evidence pack contains. It is written for Heads of Operational Resilience and CROs who are preparing or reviewing their current year testing cycle.

What SS1/21 Actually Requires

PRA SS1/21 Chapter 7 establishes the annual scenario testing obligation for firms within its scope. The requirement is not simply to conduct a test. It is to design, execute, and document a scenario-based assessment of the firm's ability to remain within its impact tolerances under severe but plausible disruption conditions, and to produce evidence that can demonstrate that position to a supervisor.

Three elements of that obligation carry specific evidencing implications that are frequently under-appreciated.

The first is scenario selection. PRA SS1/21 requires scenarios to be severe but plausible and calibrated to the firm's specific risk profile. The supervisory standard is not met by generic scenarios drawn from industry guidance or applied without modification to the firm's actual dependency structure. A scenario must reflect the specific ways in which this firm's Important Business Services could fail: the technology systems they depend on, the third parties that support them, the people without whom recovery is impractical, and the conditions under which multiple dependencies might fail simultaneously. If the scenario selection cannot be defended as genuinely tailored to the firm's IBS architecture, the evidence produced by the exercise is weakened regardless of its quality in other respects.

The second is the evidence trail from test design to board sign-off. The PRA's supervisory expectation, confirmed through engagement with firms across the sector, is that the complete chain of evidence is preserved: why the scenario was selected, how the test was designed, who participated, what was tested, what the outcomes were per IBS, what findings arose, what lessons were drawn, and what the board was told and when. Gaps in that chain, and in particular the absence of board reporting that reflects what the testing actually found, are a primary supervisory concern.

The third is what findings are expected. A scenario testing programme that produces only clean outcomes, where every IBS recovers within tolerance and no material weaknesses are identified, will receive supervisory scrutiny. The PRA understands that genuine stress testing finds gaps. A clean outcome from a well-designed test is credible. A clean outcome from every test in a multi-year programme is not. Testing designed to confirm resilience rather than interrogate it is not compliant with the SS1/21 intent, and experienced supervisors recognise the difference in the design of the scenario itself.

What a Supervisory-Grade Evidence Pack Contains

The following represents the minimum evidencing standard for a single annual scenario test under PRA SS1/21. Each element is required. Missing elements create a gap that a supervisor will identify and probe.

Scenario selection rationale. A written document explaining why this specific scenario was selected for this test cycle. The rationale must reference the firm's current IBS architecture, the dependency analysis that informs the scenario choice, and any regulatory guidance or supervisory expectation that informed the selection. The rationale should be version-controlled and date-stamped. It is not sufficient to state that the scenario is severe but plausible without demonstrating why it is both things for this firm in this year.

Test design document. The full test design, including the scenario narrative, the disruption parameters, the IBS and dependencies in scope, the test format (tabletop, live exercise, structured walkthrough), the participants by role and name, the date and duration, and the facilitator. This document establishes the methodology and is the baseline against which the execution record is assessed. It should be produced before the test and not amended to reflect what actually happened.

Execution record. A contemporaneous account of the test as it was run: the sequence of events or injects, the decisions taken by participants at each stage, the point at which each IBS was assessed against its impact tolerance, and any deviations from the test design. The execution record is the document that proves the test was substantive rather than procedural. An execution record that simply confirms the scenario ran to plan without capturing the actual decision-making and recovery activities of participants provides limited evidential value.

Findings register. Every finding from the test documented, regardless of severity: gaps in recovery capability, dependencies that were not understood before the test, timing failures against tolerance, resource constraints that were exposed under the scenario conditions, communication failures between teams or with third parties. Findings should be rated by severity and assigned an owner and a remediation timeline. The absence of a findings register, or a findings register that records only low-severity items from a scenario designed to be severe, requires explanation.

Impact tolerance outcome statement. A formal statement per IBS of whether the firm remained within its impact tolerance under the scenario conditions. This is the core regulatory output of the test and must be explicit: the tolerance was met, the tolerance was not met, or the test was insufficiently severe to reach the tolerance boundary. Where a tolerance was not met, the statement must be accompanied by a remediation action and timeline. Where the test did not reach the tolerance boundary, the rationale for why the scenario nonetheless constitutes a valid severe but plausible test must be provided.

Lessons learned register. Distinct from the findings register. Where findings record what went wrong, the lessons learned register records what the firm has understood from the test about its resilience position that was not known before. These are structural observations that inform future scenario design, dependency mapping updates, impact tolerance reviews, and BCM plan revisions. The lessons learned register demonstrates that the firm is using testing to improve its resilience position, not merely to evidence compliance.

Remediation tracker. A consolidated register of all findings and lessons requiring action, with named owners, target completion dates, and status updates. The remediation tracker carries forward between test cycles: findings from the previous year that remain open should be visible in the current cycle record, demonstrating the firm's ongoing management of identified weaknesses. A tracker that is created fresh for each test cycle without reference to prior findings is an evidencing gap.

Board reporting record. Documentation that the board received a summary of the test outcomes, including any tolerance breaches or material findings, and that the board acknowledged the position and any associated remediation. Under PRA SS1/21 Chapter 8, the self-assessment must reflect the current testing position, which means the board's engagement with scenario testing outcomes is part of the self-assessment governance trail. A test that was conducted but not reported to the board at the level required by SS1/21 is an incomplete evidence record.

The Most Common Gaps in Practice

Across FourthLine's assessment work, three gaps appear in scenario testing evidence records more consistently than any others.

The first is the missing scenario rationale. Firms conduct the test and document the outcome but do not retain a clear, dated record of why that scenario was chosen. When a supervisor asks why the firm tested a technology outage scenario in Q1 rather than a third-party failure scenario, the absence of a written rationale makes the selection look arbitrary rather than risk-informed.

The second is the execution record that narrates rather than evidences. Many firms produce a post-exercise report that describes what happened in the exercise in general terms, confirms that recovery actions were taken, and records the outcome. This is not an execution record in the regulatory sense. The execution record must capture what specific decisions were made, by whom, and at what point in the scenario timeline, and whether those decisions resulted in recovery within tolerance. Narrative summaries cannot substitute for contemporaneous documentation.

The third is the absence of a live remediation tracker that connects prior findings to current cycle evidence. A firm that has been running annual testing for three years should be able to show, in its current year evidence pack, the status of findings from each prior cycle. If every annual pack is produced as a self-contained document with no reference to prior findings, it suggests the programme is being run for compliance purposes rather than for continuous improvement. Supervisors notice this pattern.

Using This Standard to Assess Your Current Position

A practical way to assess whether your current scenario testing evidence meets the standard described above is to ask: if a PRA supervisor asked to see the complete evidence record for our most recent test, what would they find, and what questions would it leave unanswered?

The questions that most commonly arise from inadequate evidence packs are: why was this scenario selected; what did the test actually find; did the board receive an honest account of the outcomes; and what has been done about the gaps the test identified.

If any of those questions cannot be answered directly from the evidence record in front of you, there is a gap that warrants attention before the next supervisory interaction.

FourthLine's Annual Resilience Retainer includes scenario testing design, facilitation, and evidence pack production as a core quarterly deliverable, produced to the standard described in this article. For firms that want to assess their current evidence position independently before committing to an ongoing programme, the Diagnostic Assessment provides a structured review of testing quality against the SS1/21 evidencing standard, with a clear findings report and remediation roadmap.