If a trillion-parameter model can't handle your taxes, that to me says more about the tax code than the AI code.
People who paste undisclosed AI slop in forums deserve their own place in hell, no argument there. But what are some good examples of simple tax questions where current models are dangerously wrong? If it's not a private forum, can you post any links to those questions?
So, a super-basic one I saw recently, in relation to Irish tax. In Ireland, ETFs are taxed differently to normal stocks (most ETFs available here are accumulating, they internally re-invest dividends; this is uncommon for US ETFs for tax reasons). Normal stocks have gains taxed under the capital gains regime (33% on gains when you sell). ETFs are different; they're taxed 40% on gains when you sell, and they are subject to 'deemed disposal'; every 8 years, you are taxed _as if you had sold and re-bought_. The ostensible reason for this is to offset the benefit from untaxed compounding of dividends.
Anyway, the magic robot 'knew' all that. Where it slipped up was in actually _working_ with it. Someone asked for a comparison of taxation on a 20 year investment in individual stocks vs ETFs, assuming re-investment of dividends and the same overall growth rate. The machine happily generated a comparison showing individual stocks doing massively better... On closer inspection, it was comparing growth for 20 years for the individual stocks to growth of 8 years for the ETFs. (It also got the marginal income tax rate wrong.)
But the nonsense it spat out _looked_ authoritative on first glance, and it was a couple of replies before it was pointed out that it was completely wrong. The problem isn't that the machine doesn't know the rules; insofar as it 'knows' anything, it knows the rules. But it certainly can't reliably apply them.
(I'd post a link, but they deleted it after it was pointed out that it was nonsense.)
Interesting, thanks. That doesn't seem like an entirely simple question, but it does demonstrate that the model is still not great at recognizing when it is out of its league and should either hedge its answer, refuse altogether, or delegate to an appropriate external tool.
This failure seems similar to a case that someone brought up earlier ( https://news.ycombinator.com/item?id=43466531 ). While better than expected at computation, the transformer model ultimately overestimates its own ability, running afoul of Dunning-Kruger much like humans tend to.
Replying here due to rate-limiting:
One interesting thing is that when one model fails spectacularly like that, its competitors often do not. If you were to cut/paste the same prompt and feed it to o1-pro, Claude 3.7, and Gemini 2.5, it's possible that they would all get it wrong (after all, I doubt they saw a lot of Irish tax law during training.) But if they do, they will very likely make different errors.
Unfortunately it doesn't sound like that experiment can be run now, but I've run similar tests often enough to tell me that wrong answers or faulty reasoning are more likely model-specific shortcomings rather than technology-specific shortcomings.
That's why I get triggered when people speak authoritatively on here about what AI models "can't do" or "will never be able to do." These people have almost always, almost without exception, been proven dead wrong in the past, but that never seems to bother them.
It's the sort of mistake that it's hard to imagine a human making, is the thing. Many humans might have trouble compounding at all, but the 20 year/8 year confusion just wouldn't happen. And I think it is on the simple side of tax questions (in particular all the _rules_ involved are simple, well-defined, and involve no ambiguity or opinion; you certainly can't say that of all tax rules). Tax gets _complicated_.
This reminds me of the early days of Google, when people who knew how to phrase a query got dramatically better results than those who basically just entered what they were looking for as if asking a human.
And indeed, phrasing your prompts is important here too, but I mean more that by having a bit of an understanding of how it works and how it differs from a human, you can avoid getting sucked in by most of these gaps in its abilities, while benefitting from what it's good at. I would ask it the question about the capital gains rules (and would verify the response probably with a link I'd ask it to provide), but I definitely wouldn't expect it to correctly provide a comparison like that. (I might still ask, but would expect to have to check its work.)
People who paste undisclosed AI slop in forums deserve their own place in hell, no argument there. But what are some good examples of simple tax questions where current models are dangerously wrong? If it's not a private forum, can you post any links to those questions?