Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not an expert in the field, but apparently there has been some research into this. It's called inference-time intervention [1], [2].

[1] "Steering Language Models With Activation Engineering", 2023, https://arxiv.org/abs/2308.10248

[2] "Multi-Attribute Steering of Language Models via Targeted Intervention", 2025, https://arxiv.org/pdf/2502.12446



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: