Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a wider problem in general and an odd bit of history: scientific papers pre-date the internet, and as such the reference system exists pre-computation. the DX-DOI system is a substantial improvement, but it's not the convention or expectation - and IMO also insufficient.

Realistically, all papers should just be including a list of hashes which can be resolved exactly back to the reference material, with the conventional citation backstopping that the hash and author intent and target all match.

But that runs directly into the problem of how the whole system is commercialized (although at least requiring publishers to hold proofs they have a particular byte-match for a paper on file would be a start).

Basically we have a system which is still formatted around the limitations of physical stacks of A4/Letter size paper, but a scientific corpus which really is too broad to only exist in that format.



The issue with hashing is that it's really tricky to do that with mixed media. Do you hash the text? The figures? What if it's a PDF scan of a different resolution? I think it's a cool idea—but you'd have to figure out how to handle other media, metadata, revisions, etc, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: