The cognitive science of AI alignment

Abstract

Modern AI systems are increasingly, and perhaps alarmingly, exceeding human performance in domains such as competition mathematics and coding (UK AISI, 2025). AI agents can now independently implement software engineering artifacts requiring hours of complex reasoning effort from humans. As AI capability and agency increase, designing reliable mechanisms to align AI systems, ensuring they act consistently with human values even when unmonitored, grows ever more urgent. Yet AI alignment remains poorly understood.

Publication
Proceedings of the 48th Annual Meeting of the Cognitive Science Society
Date
Links