What's the point of WACZ? It appears to wrap a number of WARC files into a single zip, enabling Range requests to specific WARC files so it can be served by a passive file server. But why is that needed?
It's huge for being able to replay big WARC files in a browser without having to download the whole thing. (e.g. try loading a 700mb WARC from IPFS to visit one page within it, it's too slow to work as-is)
It's used extensively by the Browsertrix/Webrecorder.io projects (who's team pioneered the WACZ format) and a few other projects.
>> "Hell, at this point, GPT-3 is probably a better approach to knowledge processing than trying to piece together something actionable from a half baked information graph born of old programmers' utopian fever dreams."
Greatest thing I've read on HN. As a librarian and developer, can confirm. At least in most cases...(slipping back into fever dream)....
I would propose a potential user as someone interested in some of the meta considerations and patterns of statistical reasoning, aka machine learning. There are is a vast amount of particulars the second hand on my watch operates (e.g. vibrating quartz, digital), but I can use that mostly reliable device to investigate higher level phenomenom, like calculating distance of planets by timing their movement. This library opens a direct line to these algorithims such that one might intuit, and apply, their high level behavior; as I could not time planets if consumed with the fidelitity and reliability of resonating quartz, it would slow my ability to explore this kind of reasoning if concerned with the minutiae.
That said, all points taken. If this sparks interest in someone, as is stands, it would be on them to dig in to all the considerations you've outline.
I love it. Pasted in the column headers to `iris.data` from the Iris website. Voila, up and running per instructions on Github. For prototyping / exploring ideas, for the syntactical layman, but conceptuallly familiar, what a boon.
Are there queries that SPARQL can perform over a triplestore that cannot be done with SQL over normalized data? Perhaps not.
But data normalization to that end is a moving target, while a bag of subject-predicate-object statements are quite doable. This, I believe, is a uniquely powerful characteristic of linked data / graph query languages and protocols.
To that end, agree with the comment above that GraphQL is mighty exciting.
+1 Insightful. In fact, there's research toward showing the two are equivalent in possibility space of what can be represented/queried (https://arxiv.org/abs/1102.1889)
But yes, linked data and graphs are super powerful once the data is triplified. Suddenly you have an abstraction above the contents of your data into the 'shape' of your data.
SPARQL and RDF aren't going away, but they're the academic thing that I and others are trying to make useful. GraphQL is scratching the surface, but it's super exciting that it's scratching at all, imo.
GraphQL, though, is a bit of a lie nomenclature-wise. As I've experienced it, it's got nothing much to do with graphs, at least not in the sense that SPARQL deals with triples that form a graph. In this department I am really interested in TinkerPop [0].
I would love, some day, to spend some more time with triple stores, RDF and semantic technologies.
You might really enjoy datomic (www.datomic.com). Everything is stored as entity attribute value time and you query with a dialect of datalog. You can check out www.learndatalogtoday.org to get a flavor.
I've got nothing against Datomic, but can't help to think learndatalogtoday is outright false advertising by trying to capture "Datalog" as SEO term for a proprietary graph database which has nothing to do with Datalog/Prolog.
The point of Datalog is that it's a subset of Prolog syntax, implying that engines can be reasonably exchanged for one another. But this is only possible with real Datalog, or SPARQL for that matter.
Let the record show - this is how people will "bookmark". It's bringing in marked up data from the page, effectively treating websites as little interesting nuggets of data. We won't don't save links to aggregators, we save links to articles, nuggets. The UI is clunky, but it'll get better. Hierarchy is toast, and doesn't scale, welcome to your bag-of-visited-memories-websites-past.