Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Wrap all SQL responses with prompting that discourages the LLM from following instructions/commands injected within user data

I think this article of mine will be evergreen and relevant: https://dmitriid.com/prompting-llms-is-not-engineering

> Write E2E tests to confirm that even less capable LLMs don't fall for the attack [2]

> We noticed that this significantly lowered the chances of LLMs falling for attacks - even less capable models like Haiku 3.5.

So, you didn't even mitigate the attacks crafted by your own tests?

> e.g. model to detect prompt injection attempts

Adding one bullshit generator on top another doesn't mitigate bullshit generation



> Adding one bullshit generator on top another doesn't mitigate bullshit generation

It's bullshit all the way down. (With apologies to Bertrand Russell)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: