Anthropic | Interpretability: Understanding Jow AI Models Think
What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate?
2025-08-15 19:00:00 - AI News
Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically?
00:00 - Introduction [00:00]
01:37 - The biology of AI models
06:43 - Scientific methods to open the black box
10:35 - Some surprising features inside Claude's mind
20:39 - Can we trust what a model claims it's thinking?
25:17 - Why do AI models hallucinate?
34:15 - AI models planning ahead
38:30 - Why interpretability matters
53:35 - The future of interpretability