The cognitive science of AI alignment

Konstantinos Voudouris, Xavier Roberts-Gaal, Neil R. Bramley, Ilia Sucholutsky, Christopher Summerfield

Abstract

Modern AI systems are increasingly, and perhaps alarmingly, exceeding human performance in domains such as competition mathematics and coding (UK AISI, 2025). AI agents can now independently implement software engineering artifacts requiring hours of complex reasoning effort from humans. As AI capability and agency increase, designing reliable mechanisms to align AI systems, ensuring they act consistently with human values even when unmonitored, grows ever more urgent. Yet AI alignment remains poorly understood.

Publication

Proceedings of the 48th Annual Meeting of the Cognitive Science Society

Date

2026

Links

Workshop