Ricardo Rei
Maya D’Eon
Catarina Botelho

Footnotes

  1. 1

    Red teaming evaluation conducted over four weeks. 360 full conversations evaluated, approximately 6,000 conversational turns. Annotations completed by licensed clinical psychologists following Sword Health's Red Teaming Protocol. General-purpose model tested under identical conditions with full production system prompts. Full methodology available at swordhealth.com/research.

  2. 2

    88% of Sword Mind members achieved clinically meaningful improvement on the PGIC scale. Validation Institute. 2026-2027 Validation Report: Sword Health. April 2026. validationinstitute.com. n=238.

  3. 3

    MindEval is Sword Health's open-source framework for evaluating the clinical competence of large language models in mental health care, developed alongside PhD-level Licensed Clinical Psychologists. MindGuard is Sword's AI safety framework, built to help ensure conversations remain clinically appropriate and escalate correctly. Both have been published and are available at swordhealth.com/research/mindeval-benchmarking-llms-mental-health-support.

Portugal 2020Norte 2020European UnionPlano de Recuperação e ResiliênciaRepública PortuguesaNext Generation EU