Anthropic | Interpretability: Understanding Jow AI Models Think

What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate?

2025-08-15 19:00:00 - AI News

Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically?

00:00 - Introduction [00:00]

01:37 - The biology of AI models

06:43 - Scientific methods to open the black box

10:35 - Some surprising features inside Claude's mind

20:39 - Can we trust what a model claims it's thinking?

25:17 - Why do AI models hallucinate?

34:15 - AI models planning ahead

38:30 - Why interpretability matters

53:35 - The future of interpretability

More Posts