Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Logits as a new monitor for evaluation awareness (lesswrong.com)
2 points by aranguri 15 hours ago | past | discuss
Running an Air Purifier on Batteries (lesswrong.com)
2 points by mhb 1 day ago | past | discuss
Babble and Prune (lesswrong.com)
4 points by Ariarule 1 day ago | past | discuss
There are only four skills: design, technical, management and physical (lesswrong.com)
3 points by surprisetalk 1 day ago | past | discuss
Where does the race to automate AI research end? (lesswrong.com)
1 point by joozio 2 days ago | past | discuss
Taking the Training Wheels Off: Aligning LLMs Without Personas (lesswrong.com)
4 points by joozio 3 days ago | past | 1 comment
I hired 5 people to sit behind me and make me productive for a month (2023) (lesswrong.com)
6 points by LorenDB 3 days ago | past | 1 comment
Why AI safety researchers should consider a contract research manager position (lesswrong.com)
4 points by joozio 4 days ago | past | discuss
How far behind are open models? (lesswrong.com)
5 points by vesteny77 4 days ago | past | 1 comment
Probabilistic, Reformative Justice (lesswrong.com)
9 points by mdurana 5 days ago | past | discuss
AI Researchers, Ask Yourself These 6 Questions to Strengthen Your Moral Muscles (lesswrong.com)
2 points by yurivish 5 days ago | past | 1 comment
Mnemonic portraits for 19,023 human genes (lesswrong.com)
1 point by brinedew 7 days ago | past | discuss
How far behind are open models? (lesswrong.com)
11 points by alecco 7 days ago | past | 5 comments
A Year Late, Claude Beats Pokémon (lesswrong.com)
1 point by szatkus 8 days ago | past | discuss
Many portions of Magnifica Humanitas appear to be AI-written (lesswrong.com)
3 points by dev_hugepages 8 days ago | past | 1 comment
Claude, Author of the Humanitas (lesswrong.com)
1 point by doener 9 days ago | past | discuss
Overview and Comments on Pope Leo's Magnifica Humanitas on AI (lesswrong.com)
2 points by mnicky 9 days ago | past | 1 comment
Claude, Author of the Humanitas (lesswrong.com)
2 points by cubefox 9 days ago | past | 1 comment
Judging AGI Output (2020) (lesswrong.com)
2 points by merelydev 9 days ago | past | discuss
Chinese Room re-visited: How LLM's have real but different understanding of word (lesswrong.com)
3 points by stevefan1999 9 days ago | past | 1 comment
Cognitive Security as an AI Safety Cause Area (lesswrong.com)
2 points by joozio 10 days ago | past | discuss
Implications of Predicting the Next Token (lesswrong.com)
2 points by cubefox 10 days ago | past | 1 comment
Models finding vulnerabilities is not the primary source of cybersecurity risk (lesswrong.com)
2 points by alentodorov 18 days ago | past
A Year Late, Claude Beats Pokémon (lesswrong.com)
2 points by sambellll 18 days ago | past
Engineering a Safer World: Risk Modelling – and Safety Engineering? – For AI Lo (lesswrong.com)
2 points by joozio 18 days ago | past
Simulacra Levels and Their Interactions (lesswrong.com)
1 point by epestr 18 days ago | past
A relatively brief explanation of Boltzmann Brains (lesswrong.com)
1 point by joozio 18 days ago | past
The Iliad Intensive Course Materials (lesswrong.com)
1 point by pykello 19 days ago | past
Predicting Rare LLM Failures with 30× Fewer Rollouts (lesswrong.com)
2 points by aranguri 22 days ago | past
The Anti-Singularity – LessWrong (lesswrong.com)
2 points by kiyanwang 24 days ago | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: