Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is lovely. I’ve had a vague notion that I wanted something like this… but never quite solidified my understanding of what is necessary. This is a super minimal-maximal example. Bravo.

Do equivalent programs produce the same hash?

Ie: a = 1 + 1 Return a

Vs

a = 2 Return a

What if local internal variable names change?

Ie

b = 2 Return b



This is one great question I've wrestled with over the years.

Right now, all three programs would result in a different hash. I decided that the scrapyard publication process should do as little "magic" as possible.

As the tooling evolves, we can provide a canonicalizer scrap that renames variables and simplifies the code. I think eventually we would make this the default process to encourage more reuse.


You might want to look into the way Unison does things: [1], they've got a lot of similar ideas and IIRC the way their hash system works avoids that ambiguity.

[1]: https://www.unison-lang.org/learn/the-big-idea/


Wow, that's a great guide!

I didn't read it in-depth yet, but I think I had a previous implementation that worked very similarly to theirs.

The problem with variable names is that they actually matter for some of the collaborative editor features I hope to implement in the future. There are certain assumptions I can't make quite yet, and so am keeping my options open :)


Have you thought about using type aliases or higher order types of some sort for that instead of relying on the variable name?

I could define that a familyname and a givenname are strings, and then a function argument of type familyname would be differentiated from a function argument of type givenname, but still the argument names themselves could be dropped from the hash.


Thinking a bit more:

A-If a program is a thing that produces a result for a given input, these programs are the same.

B-You can’t generally know if two programs are the same without solving halting problems, reducing polynomial algos to linear time or having infinite time and space computing resources, etc.

C-The best you can do is ask if two compilers produce the same output, or a linting/refactoring/canonicalization/formatting operation results in same source.

D-If a program is a thing that interactive debuggers use to create explorations next to source code while computing, these programs are different.

I agree with your approach of starting with these programs as being different, and allowing custom scraps to become a higher layer hash resolver. The reason is that compilers change over time, and it is only under the lens of a compiler(or linter, etc) that comparisons can be made. To bake a holy compiler in today would be arbitrary and quickly outdated.


Due to the halting problem, it's not possible to tell in general if two programs are equivalent.


As far as I know, nobody ever proved that some programs for which halting cannot be determined are smaller in size than the number of atoms in the universe. Or in other words, that the theorem applies to practical programs. It's purely theoretical.


Here is a Python program which is definitely smaller than the number of atoms in the universe. Can you tell me whether it halts?

  n = 1
  while sum(d for d in range(1,n) if n % d == 0) != n:
      n += 2


Oh this is a good one. It halts if there’s an odd perfect number, right?


Yup.


If I can't then that's no proof that the halting theorem is true for practical programs.


If it's so hard even for such a simple program to determine whether it halts, why do you expect it to not be a problem for much more complex programs?


It absolutely is a problem. But don't say that there is no solution to it unless you have a proof.

I'd be impressed if you gave me a program for which it provably cannot be proved that (edit: if) it halts. Or a bound on its size.


> I'd be impressed if you gave me a program for which it provably cannot be proved that it halts. Or a bound on its size.

For every program that halts, it's obviously possible to prove that it halts. The point is that there is no algorithm for deciding if an arbitrary given program halts.


> there is no algorithm for deciding if an arbitrary given program halts.

But can you prove that statement if I change it into:

there is no algorithm for deciding if an arbitrary given program with size less than N halts; for some suitable definition of size.


Since we're considering practicality, let's change the statement to: “There is no algorithm for deciding if an arbitrary given program with size less than 1000 bytes halts which can be implemented without solving several famous open mathematical problems”. I think my program pretty clearly shows that this is true.


[never mind, misread the conditiom]


Why? This gives sum([]) != 1, which is true. Have you actually tried running the program?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: