Submissions from lesswrong.com

		Logits as a new monitor for evaluation awareness (lesswrong.com)
		2 points by aranguri 15 hours ago \| past \| discuss
		Running an Air Purifier on Batteries (lesswrong.com)
		2 points by mhb 1 day ago \| past \| discuss
		Babble and Prune (lesswrong.com)
		4 points by Ariarule 1 day ago \| past \| discuss
		There are only four skills: design, technical, management and physical (lesswrong.com)
		3 points by surprisetalk 1 day ago \| past \| discuss
		Where does the race to automate AI research end? (lesswrong.com)
		1 point by joozio 2 days ago \| past \| discuss
		Taking the Training Wheels Off: Aligning LLMs Without Personas (lesswrong.com)
		4 points by joozio 3 days ago \| past \| 1 comment
		I hired 5 people to sit behind me and make me productive for a month (2023) (lesswrong.com)
		6 points by LorenDB 3 days ago \| past \| 1 comment
		Why AI safety researchers should consider a contract research manager position (lesswrong.com)
		4 points by joozio 4 days ago \| past \| discuss
		How far behind are open models? (lesswrong.com)
		5 points by vesteny77 4 days ago \| past \| 1 comment
		Probabilistic, Reformative Justice (lesswrong.com)
		9 points by mdurana 5 days ago \| past \| discuss
		AI Researchers, Ask Yourself These 6 Questions to Strengthen Your Moral Muscles (lesswrong.com)
		2 points by yurivish 5 days ago \| past \| 1 comment
		Mnemonic portraits for 19,023 human genes (lesswrong.com)
		1 point by brinedew 7 days ago \| past \| discuss
		How far behind are open models? (lesswrong.com)
		11 points by alecco 7 days ago \| past \| 5 comments
		A Year Late, Claude Beats Pokémon (lesswrong.com)
		1 point by szatkus 8 days ago \| past \| discuss
		Many portions of Magnifica Humanitas appear to be AI-written (lesswrong.com)
		3 points by dev_hugepages 8 days ago \| past \| 1 comment
		Claude, Author of the Humanitas (lesswrong.com)
		1 point by doener 9 days ago \| past \| discuss
		Overview and Comments on Pope Leo's Magnifica Humanitas on AI (lesswrong.com)
		2 points by mnicky 9 days ago \| past \| 1 comment
		Claude, Author of the Humanitas (lesswrong.com)
		2 points by cubefox 9 days ago \| past \| 1 comment
		Judging AGI Output (2020) (lesswrong.com)
		2 points by merelydev 9 days ago \| past \| discuss
		Chinese Room re-visited: How LLM's have real but different understanding of word (lesswrong.com)
		3 points by stevefan1999 9 days ago \| past \| 1 comment
		Cognitive Security as an AI Safety Cause Area (lesswrong.com)
		2 points by joozio 10 days ago \| past \| discuss
		Implications of Predicting the Next Token (lesswrong.com)
		2 points by cubefox 10 days ago \| past \| 1 comment
		Models finding vulnerabilities is not the primary source of cybersecurity risk (lesswrong.com)
		2 points by alentodorov 18 days ago \| past
		A Year Late, Claude Beats Pokémon (lesswrong.com)
		2 points by sambellll 18 days ago \| past
		Engineering a Safer World: Risk Modelling – and Safety Engineering? – For AI Lo (lesswrong.com)
		2 points by joozio 18 days ago \| past
		Simulacra Levels and Their Interactions (lesswrong.com)
		1 point by epestr 18 days ago \| past
		A relatively brief explanation of Boltzmann Brains (lesswrong.com)
		1 point by joozio 18 days ago \| past
		The Iliad Intensive Course Materials (lesswrong.com)
		1 point by pykello 19 days ago \| past
		Predicting Rare LLM Failures with 30× Fewer Rollouts (lesswrong.com)
		2 points by aranguri 22 days ago \| past
		The Anti-Singularity – LessWrong (lesswrong.com)
		2 points by kiyanwang 24 days ago \| past \| 1 comment
		More