The papers from Anthropic on interpretability are pretty good. They look at how ...

		libraryofbabel 13 days ago \| parent \| context \| favorite \| on: LLMs aren't world models The papers from Anthropic on interpretability are pretty good. They look at how certain concepts are encoded within the LLM.