While it's useful to not bother when you know it's unlikely to give good results, it does also feel a bit like a cop-out to suggest that the user shouldn't be asking it certain (unspecified) things in the first place. If this is the only solution, we should just crowdsource topics or types of question it can't do >50% of the time so not everyone has to reinvent the wheel
I’m making a tool to analyse financial transactions for accountants and identify things like misallocated expenses. Initially I was getting an LLM to try and analyse hundreds of transactions in one go. It was correct roughly 40-50% of the time, inconsistent and hallucinated frequently.
I changed the method to simple yes no question and to analyse each transaction individually. Now it is correct 85% of the time and very consistent.
Same model, same question essentially but a different way of asking it.
Oftentimes I ask simple factual questions that I don't know the answer to. This is something it should excel at, yet it usually fails, at least on the first try. I guess I subconsciously ignore questions that are extremely easy to google (if you ignore the worst AI in existence) or can be found by opening the [insert keyword] wikipedia article. You don't need AI for those.
Amusingly enough, my rule of thumb for if an LLM is likely to be able to answer a question is "could somebody who just read the relevant Wikipedia page answer this?"
I can't. That's my single biggest frustration about using LLMs: so much of what they can and cannot do comes down to intuition you need to build up over time, and I can't figure out how to express that intuition in a way that can quickly transfer to other people.
The most recent one I have was not in English. It was a translation question of a slang word between two non-English languages. It failed miserably (just made up some complete nonsense). Google had no trouble finding relevant pages or images for that word (without any extra prompt), so it was rather unique and not that obscure. Disclaimer: I'm not using any extra prompts like "don't make shit up and just tell me you don't know".
Most recent technical I can remember (and now would be a good time to have the actual prompt) was that I asked whether MySQL has a way to run UPDATE without waiting for lock. Basically ignore rows that are locked. It (Sonnet 4 IIRC) answered of course and gave me an invalid query in the form of `UDPATE ... SKIP LOCKED`;
I can't imagine what damage this does if people are using it for questions they don't/can't verify. Programming is relatively safe in this regard.
But as I noted in my other reply, there will be a bias on my side, as I probably disregard questions that I know how to easily find answers to. That's not something I'd applaud AI for.
I find this the most surprising. I have yet to cross 50% threshold of bullshit to possibly truth. In any kind of topic I use LLMs for.