ยท 9 days agoยท AI Alignment Forum
Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability
Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT controllability conditions require the modelsโ t