I like this prompt for testing LLMs as the problem is easy to reason about but likely doesn't come up a lot in training data:
> I'm playing assetto corsa competizione, and I need you to tell me how many liters of fuel to take in a race. The qualifying time was 2:04.317, the race is 20 minutes long, and the car uses 2.73 liters per lap.
GPT-3.5 gave me a right-ish answer of 24.848 liters, but it did not realize the last lap needs to be completed once the leader finishes. GPT-4 gave me 28-29 liters as the answer, recognizing that a partial lap needs to be added due to race rules, and that it's good to have 1-2 liters of safety buffer.
I prompted Bard today and the three drafts gave three different answers: 18.28, 82.5, and 327.6 liters. All of these were wildly wrong in different ways.
Interesting. Regenerating replies does seem to help, for whatever reason. I've noticed the first after model loading seems to have a higher likelihood of being inaccurate or a hallucination.
Regenerating a GPT4xAlpaca 30B model from its initial answer of 243L, I got a variation of the following for the next 5 rerolls:
"Based on your qualifying time of 2:04.317 and the length of the race being 20 minutes, we can calculate the number of laps needed for the entire race. Assuming each lap takes approximately 2 minutes (based on the average speed), there will be 20 / 2 = 10 laps during the race.
Using the information provided about the car using 2.73 liters per lap, we can determine the total amount of fuel required for the race. Therefore, it would be advisable to carry at least 10 * 2.73 = 27.3 liters of fuel for this race."
Sure thing! If the race is 20 minutes and each lap takes 2m04s, that means there will be 9.67 laps till the race is over, and you round that up to 10 since partial laps must be finished. You need 2.73 liters per lap, so the 10 laps will use 27.3 liters total. GPT-4 is correct in suggesting a tiny safety buffer above that in case fuel usage differs from expected.
People try stuff like this because it's precisely the kind of problem that AI would be useful for. If one of these models turned out to be really good at it, it would signify that they're now useful for a whole class of problems.
Besides, GPT-4 did solve this question perfectly. I like that rather than just involving math, there’s also some real life knowledge needed to give a practical answer.
> I'm playing assetto corsa competizione, and I need you to tell me how many liters of fuel to take in a race. The qualifying time was 2:04.317, the race is 20 minutes long, and the car uses 2.73 liters per lap.
GPT-3.5 gave me a right-ish answer of 24.848 liters, but it did not realize the last lap needs to be completed once the leader finishes. GPT-4 gave me 28-29 liters as the answer, recognizing that a partial lap needs to be added due to race rules, and that it's good to have 1-2 liters of safety buffer.
I prompted Bard today and the three drafts gave three different answers: 18.28, 82.5, and 327.6 liters. All of these were wildly wrong in different ways.