Article makes some important points - especially re visualisation and cache effects.
Some quibbles or glossovers:
> A recursive algorithm is a depth first search. Any loop that explores candidates/neighbors without sorting the candidates is a BFS.
Not sure what to say about this. It's either wrong or I'm missing the intuition. Both DFS and BFS can either be written iteratively or recursively; the real difference is whether you pop your next candidate from the top or the bottom of the stack. Equivalently, whether you use a stack (FILO) or a queue (FIFO). That said, I took maths rather than CS so verify this for yourself.
> It is simply the best foundation for any kind of informed search (not just for 2d grids!)
A* is useful in pathfinding if you have some notion of easily-computed "distance" to your desired target and you're only running a few queries for any given graph. If you plan to run many queries on a mostly-static graph (like a road network) then you could be better off applying a pre-processing algorithm like contraction heirarchies. If you're optimising but don't know what target you're aiming for (e.g. Traveling Salesman) then using some other local search heuristic like 2-opt may be better.
> The major difference between these [A* and BFS] is BFS explores all adjacent nodes, while A* prioritizes exploring nodes that are closer to the destination.
This is definitely a difference. But it glosses over the biggest difference, which is that A* is a dynamic algorithm. That allows you to terminate early with confidence that you have found the shortest path. With BFS you may not be certain until you've searched the whole graph, which could be vast.
The intuition there is that people tend to write algorithms recursively when they easily map to interactions with the top of the stack, since that's easier to express in most languages than bringing in an external stack to think about. Hence, if you see something recursive in the wild it likely looks more like DFS than BFS. As you point out, that isn't a hard rule.
Indeed, BFS, DFS, and A* are the same algorithm just with a different data structure to track unexplored nodes. BFS uses a FIFO queue, DFS uses a LIFO stack, and A* uses a priority queue (generally a heap)
A* is also importantly different for what the ordering of the priority queue is. Dijkstra's algorithm is just BFS with a priority queue, but the lack of heuristic in the comparison key keeps it clearly breadth-first.
No you don't have to search whole graph with BFS (it might happen if source and target is on opposite corners), first time you reach a node you know 100% that it is the shortest path. That's one of basic invariants allowing BFS to produce correct result, you can terminate early when all targets are reached. A difference between A* and BFS is that instead of searching shortest path between two points BFS finds shortest path from single point to ALL points in graph. A* makes a tradeof of speeding up individual query by answering to weaker question. When problem allows it replacing thousands of A* calls by single BFS or Dijkstra call can give a major speedup. Another important difference is that BFS only works on graphs where all edges have same length, A* supports mixed edge lengths. They are not interchangeable, just like finding minimum element in a list isn't replacement for sorting the list.
This is wrong or at least misleading in a couple of ways:
A* is not limited to point-to-point, it also handles point-to-set-of-points just fine (you just might have to reverse the direction you're thinking in). Although this is most often used for "we only need a path to one of them", it's still faster even if you need a path to all of them due to merging. Visiting all points (as BFS does) on the graph is rarely what you actually want (though admittedly, if you're able to throw away the grid it becomes more likely; like the article, I do find people use grids far too often).
BFS works just fine with variable edge weights, you just have to use a priority queue instead of a plain queue for the next set of nodes to be visited.
BFS with priority queue isn't BFS it's called Dijkstra's algorithm.
Are you sure it works correctly for shortest path to all points in set not just "we need one of them" case? When running in reverse I would expect closest point to be priotized first, which would potentially mark all the area around target as visited thus blocking paths from other points in set from reaching it. It's equivalent to introducing one more point in graph and connecting it to all the points in set. Unless it works for one to all in set it's a weaker result than what Dijkstra or BFS answers. If your problem doesn't need it, don't use it. But that's my point use the weakest algorithm which directly answers required query, and be aware that building more powerful algorithm by repeatedly calling more specialized but weaker one won't always be optimal.
A* can be tweaked to solve one-to-many shortest paths. You need to be careful about how you aggregate the heuristic function over all the targets [0]. Also, the performance improvements vs Dijkstra would practically disappear in many scenarios, for example if your source is "in the middle" of all the targets. In that case there's no benefit to biasing your search towards any particular target: you might as well simply expand outwards as you do in ordinary Dijkstra.
For road network routing (which is more my thing) you are orders of magnitude better off pre-processing the graph with contraction heirarchies [1] and then using a specialised algorithm such as RPhast [2] at query time.
> BFS works just fine with variable edge weights, you just have to use a priority queue instead of a plain queue for the next set of nodes to be visited.
But usually you don't call it BFS if it's using a priority queue?
> first time you reach a node you know 100% that it is the shortest path
As you note later in your comment, this is only true in the case of uniform edge weights. In general you can come up with easy counterexamples e.g. a very long direct edge from source to target vs lots of short edges which sum to a smaller amount.
If you have mixed edges you don't use BFS you use Dijkstra's algorithm preferably with a heap and the example you said is still not a problem. The short path made of many edges would be pulled out of heap first ensuring first time visit is still the shortest. If you run actual BFS on non uniform edges, for something like a->b a->c b->d c->d d->e it won't ecesseryn even produce the shortest path at all, only way i can imagine it working is to drop the requirement of not visiting single node twice for it to produce shortest path. At that point it's not BFS at all, and not only you would have to visit whole graph you would have to visit whole graph multiple times, worst case expontially if you have something like braching chain of paths.
We are in agreement. None of this contradicts my original points. You can use BFS on a general weighted graph but you would have to search the entire graph to be sure of the shortest path. That's why no one does it.
Some quibbles or glossovers:
> A recursive algorithm is a depth first search. Any loop that explores candidates/neighbors without sorting the candidates is a BFS.
Not sure what to say about this. It's either wrong or I'm missing the intuition. Both DFS and BFS can either be written iteratively or recursively; the real difference is whether you pop your next candidate from the top or the bottom of the stack. Equivalently, whether you use a stack (FILO) or a queue (FIFO). That said, I took maths rather than CS so verify this for yourself.
> It is simply the best foundation for any kind of informed search (not just for 2d grids!)
A* is useful in pathfinding if you have some notion of easily-computed "distance" to your desired target and you're only running a few queries for any given graph. If you plan to run many queries on a mostly-static graph (like a road network) then you could be better off applying a pre-processing algorithm like contraction heirarchies. If you're optimising but don't know what target you're aiming for (e.g. Traveling Salesman) then using some other local search heuristic like 2-opt may be better.
> The major difference between these [A* and BFS] is BFS explores all adjacent nodes, while A* prioritizes exploring nodes that are closer to the destination.
This is definitely a difference. But it glosses over the biggest difference, which is that A* is a dynamic algorithm. That allows you to terminate early with confidence that you have found the shortest path. With BFS you may not be certain until you've searched the whole graph, which could be vast.