This is lovely. I’ve had a vague notion that I wanted something like this… but never quite solidified my understanding of what is necessary. This is a super minimal-maximal example. Bravo.
This is one great question I've wrestled with over the years.
Right now, all three programs would result in a different hash. I decided that the scrapyard publication process should do as little "magic" as possible.
As the tooling evolves, we can provide a canonicalizer scrap that renames variables and simplifies the code. I think eventually we would make this the default process to encourage more reuse.
You might want to look into the way Unison does things: [1], they've got a lot of similar ideas and IIRC the way their hash system works avoids that ambiguity.
I didn't read it in-depth yet, but I think I had a previous implementation that worked very similarly to theirs.
The problem with variable names is that they actually matter for some of the collaborative editor features I hope to implement in the future. There are certain assumptions I can't make quite yet, and so am keeping my options open :)
Have you thought about using type aliases or higher order types of some sort for that instead of relying on the variable name?
I could define that a familyname and a givenname are strings, and then a function argument of type familyname would be differentiated from a function argument of type givenname, but still the argument names themselves could be dropped from the hash.
A-If a program is a thing that produces a result for a given input, these programs are the same.
B-You can’t generally know if two programs are the same without solving halting problems, reducing polynomial algos to linear time or having infinite time and space computing resources, etc.
C-The best you can do is ask if two compilers produce the same output, or a linting/refactoring/canonicalization/formatting operation results in same source.
D-If a program is a thing that interactive debuggers use to create explorations next to source code while computing, these programs are different.
I agree with your approach of starting with these programs as being different, and allowing custom scraps to become a higher layer hash resolver. The reason is that compilers change over time, and it is only under the lens of a compiler(or linter, etc) that comparisons can be made. To bake a holy compiler in today would be arbitrary and quickly outdated.
As far as I know, nobody ever proved that some programs for which halting cannot be determined are smaller in size than the number of atoms in the universe. Or in other words, that the theorem applies to practical programs. It's purely theoretical.
> I'd be impressed if you gave me a program for which it provably cannot be proved that it halts. Or a bound on its size.
For every program that halts, it's obviously possible to prove that it halts. The point is that there is no algorithm for deciding if an arbitrary given program halts.
Since we're considering practicality, let's change the statement to: “There is no algorithm for deciding if an arbitrary given program with size less than 1000 bytes halts which can be implemented without solving several famous open mathematical problems”. I think my program pretty clearly shows that this is true.
Do equivalent programs produce the same hash?
Ie: a = 1 + 1 Return a
Vs
a = 2 Return a
What if local internal variable names change?
Ie
b = 2 Return b