Classic problem. Was/is also a problem with UK speed cameras IIRC. I think it was Top Gear that figured out if you drive something like 500 mph then it wouldn't register you speeding.
ISTM that's just an a priori feeling. The value or lack thereof of the product totally depends on how accurately it predicts human survey responses, which you can't know without looking at the data.
I think a median rating of 5 stars makes sense for a taxi service. You want the rating system to be able to express varying levels of bad. There are many levels of bad outcomes for a car ride. There are not many ways to improve on the median experience.
I am not a vision expert, but simply matching the resolution of your display to the resolution of your sensor wouldn't necessarily produce a clear image, unless the pixels are aligned to the sensing elements.
By analogy, if you resize a 1025x1025 image to 1024x1024, it's usually going to look bad.
The `find_item` example uses List for an argument. To my thinking, this indicates that the function is intended to mutate the argument. I don't think that was the author's intention, though, so I would prefer to use Sequence in this situation (or possibly Iterable, if we only need to traverse the sequence once in order).
> It seems to me all we need here is a measure of confidence for the result averaged over the entire answer. Low confidence is a guess/hallucination.
Even if the model knows the exact answer to the question, there may be many distinct ways of phrasing the answer. This would also lead to low confidence in any particular phrasing.
That should be okay though, 10 good answers will still report the score of the best one chosen. I think the GPTs are using beam search which is projecting out a "beam" (looks more like a tree to me) of probable answers each of which has a score of accumulated token probabilities, and then just picking the highest.
In this case, it doesn't matter how wide the beam is or how many possible answers there are, the score is still the accumulated token possibilities of the best branch.
However, others have noted in the thread that RLHF might hurt this approach severely by scoring polite responses high regardless of false answers (for example). Then you have to access the model pre-RLHF to get any idea of its true likelihood.
Ah, interesting, that does begin to explain how this might be more difficult than it initially appears. Could there some way to define.. proximity of different possible responses, and sum the confidence for all the nearby possibilities?
> Like any good mystery, the fun was in the build up and as we move towards a resolution there's a bitter sweet aspect to the slightly mundane reality of it 'merely' being an emergent property of large networks.
I would disagree with this description. An "emergent property of large networks" would be something that just appears when you wire together a large network.
To get intelligent behavior, it's not sufficient to wire together a large neural network. You also need to use an optimizer to train it on a large data set.
I think "Dial F For Frankenstein" makes your point clear. It is absurd to think the telephone network would just emergently become conscious when there are enough phones it connects together.
This is just a guess, but I don't think there's such a deep lesson here; language models and image models have simply been developed by mostly-different groups of researchers who chose different tradeoffs. In an alternate history it may very well have gone the other way around.
I would disagree. We have image generation with a variety of architectures. Diffusion models aside, it still takes a lot less parameters to model State of the art image generators with transformers (eg Parti).
Simplifying a bit, mapping (which is essentially the main goal of image generators and especially transformer generators) is just less complex than prediction.
I'm not sold on the concept of AI-hard or AI-complete problems.
For example, the Wikipedia article on AI-completeness mentions Bongard problems and Autonomous driving as examples of problems that might be AI-complete.
OK, so if I have an AI drives autonomously, is there some known querying strategy that I can use to make it solve Bongard problems? Can a Bongard problem-solving AI be made, by some known procedure to drive a car?
Without such reductions, at least the analogy to NP-hardness is incomplete. I believe these reductions are precisely what makes NP-hardness such a useful concept; even though we still haven't proven that any of these problems are objectively "hard," we are still able to show that if one of them is hard, then the others are as well!
This seems not quite right to me. A Turing machine may always halt, but with time depending on its input size. The input can be arbitrarily large, so there's no finite bound on the state space.
My point, which I admit isn't very poignant, is that you can make an FSM that simulates a TM that only uses a finite portion of its tape. Yes, for some machines you will always be able to construct an input that exceeds what a particular FSM can do. When that happens, however, you can make a bigger FSM that simulates a larger (but still finite) tape.
This is analogous to putting more memory in your computer when you have a problem that doesn't fit.