Like I said, they can implement the algorithm to solve it, but when forced to ma...

prewett · 2025-09-03T01:50:12 1756864212

That’s because there are lots of maze-solving algorithms on the web, so it’s easy to spit one back at you. But since they don’t actually understand how solve a maze, or even apply an algorithm one step at a time, it doesn’t work well.

shagie · 2025-09-04T14:01:49 1756994509

This is one that I've poked at... and surprised by.

https://chatgpt.com/share/68af64ca-b6bc-8011-b00b-0e8050c075...

The two images are the rules and the screen shot of box 1 (upper left) in https://sudokupad.app/l310pkxn5d

A human solving it is at https://youtu.be/7etaXRyE3QY (you may want to jump to the rules or the solve if you're not as interested in the community goings on).

Also https://github.com/SakanaAI/Sudoku-Bench https://sakana.ai/sudoku-bench/ (Cracking the Cryptic on AI https://youtu.be/JdHSSNKuIzU )

adventured · 2025-09-02T20:37:40 1756845460

So if you push eg Claude Sonnet 4 or Opus 4.1 into a maze scenario, and have it record its own pathing as it goes, and then refresh and feed the next Claude the progress so far, would that solve for the inability to maintain long duration context in such maze cases?

I make Claude do that on every project. I call them Notes for Future Claude and have it write notes for itself because of how quickly context accuracy erodes. It tends to write rather amusing notes to itself in my experience.

daxfohl · 2025-09-02T21:39:04 1756849144

This was from a few months ago, so things may be different now. I only used OpenAI, and the o3 model did by far the best (gpt-4o's performance was equivalent on the basic scenario when I had it just move one move at a time (which, it was still pretty good, all considered), but when I started having it summarize state and such, o3 was able to use that to improve performance, whereas 4o actually got worse).

But yeah, that's one of the things I tried. "Your turn is over. Please summarize everything you have learned about the maze so someone else can pick up where you left off". It did okay, but it often included superfluous information, it sometimes forgot to include current orientation (the maze action options were "move forward", "turn right", "turn left", so knowing the current orientation was important), and it always forgot to include instructions on how to interpret the state: in particular, which absolute direction corresponded to an increase or decrease of which grid index.

I even tried to coax it into defining a formal state representation and "instructions for an LLM to use it" up-front, to see if it would remember to include the direction/index correspondence, but it never did. It was amusing actually; it was apparent it was just doing whatever I told it and not thinking for itself. Something like

"Do you think you should include a map in the state representation? Would that be useful?"

"Yes, great idea! Here is a field for a map, and an algorithm to build it"

"Do you think a map would be too much information?"

"Yes, great consideration! I have removed the map field"

"No, I'm asking you. You're the one that's going to use this. Do you want a map or not?"

"It's up to you! I can implement it however you like!"

Mars008 · 2025-09-03T03:43:10 1756870990

> have it write notes for itself because of how quickly context accuracy erodes. It tends to write rather amusing notes to itself in my experience.

Just wondering would it help to ask it to write to someone else? Because model itself wasn't in its training set, this may be confusing.

warrenm · 2025-09-03T19:30:47 1756927847

you do not need to remember state with the simplest solver:

- place your right hand on the right wall - walk forward, never letting your hand leave the wall - arrive at the exit

yes, you travel many dead ends along the way

but you are guaranteed to get to the end of a 'traditional' maze

daxfohl · 2025-09-04T00:19:45 1756945185

Yeah I did the type where you start somewhere inside the maze and have to find the "treasure". Mainly because it was slightly easier to implement, but also had the nice side effect of not being solvable by that rule alone.

FWIW the LLMs were definitely not following that rule. They seemed to always keep going straight whenever that was an option. Which meant they would always get stuck at T intersections when both ways led to a dead end.

warrenm · 2025-09-04T15:03:30 1756998210

Starting in the middle, vs one end or the other, is definitely a different problem :)