This book is just brilliant. It was actually originally written in 1978. For me, it's in the same league as "The Mythical Man-Month".
Right now, I am wrestling with exactly the same issues: designing data structures for representing real-world entities, discovering the roles they play (in a financial decision in my case), gathering data about the entities (depending on their roles), confirming the relationships between them, and validating attributes about them, based on primary data with differing degrees of reliability.
I've been through various iterations, painfully getting better at representing "what is true and what don't we know yet".
Reading "Data and Reality" just now was like a distillation of my endeavour, all in one place. Plus lots of things I haven't properly considered yet.
I can see why many people find it dull: if you haven't struggled with representing these distinctions yourself, it can well sounds like pointless nitpicking and overcomplicating straightforward situations.
Coming back a day later, now that I've finished "Data and Reality", to say the very last pages of the book are "About Bill Kent".
In particular, it gives a reference to Bill's web site http://www.bkent.net, which includes a list of his personal and academic [1] writing over 35 years.
I was actually astonished to find his web site still running, 17 years after his death in 2005. How many of our own digital artefacts will have such longevity?
I tried reading this book few years ago but it was too dull. My eyes would keep scanning the words while my mind would drift off to think about something more interesting, and then I would constantly have to go back and re-read the paragraph my eyes just passed over, and pay attention this time.
I just dug it out again. The start of Chapter 11, over 190 pages into it, may be hinting at the issue:
"Thus far we have been largely critical, and negative. We have identified problems without really suggesting solutions. Can we identify an appropriate set of elementary concepts that will on the one hand serve as a general base for modeling information (in our limited use of that term), and on the other hand be an appropriate base for computerized implementations? Let us try."
I haven't read the book but this description doesn't do it justice in that it makes it sound boring!
Since it is stressed that the book is philosophical, I have to say that the most paradigm-shifting ideas about modeling reality I've encountered so far come out of second-order cybernetics and especially the work of Ranulph Glanville. His "Black Boox" series is hard to find online but highly recommended.
On the other hand, that article made me want to read that book. It addresses problems that I've encountered in real life, and it doesn't seem to sell a "miracle solution".
Would you mind expanding a bit more on how second-order cybernetics was paradigm-shifting to you?
The genius of natural language is that it communicates your entropy along with your information. We can make statements that tell us about unknown things in terms of other unknown things, and yet still get use out of them. "whoever killed him owns white gloves" doesn't let you fill in the blanks, but it certainly conveys information in the Shannon sense of reducing the size of the set of possibilities. Our computer languages surprisingly lack this as pointed out. They don't let us store data with anything other than absolute specificity, and thus run into cumbersome challenges whenever we have to merge "the butler" entity with the "murderer entity".
Ive been curious about that book ever since Rich Hickey recommended it but couldnt easily get my hands on a copy and kinda forgot about it. This was a great review; Im going to give it another try.
Yeah when I read this review, it sounded like something Rich Hickey would advocate for! Basically the fact that static schemas and types are brittle ... they frequently clash with "reality".
> It’s just a simple numbers game: there are more obscure books than popular books, so there are more obscure good books than popular good books, so the more books you read the more likely your favorite is an obscure one. In other words, if someone says they love fantasy novels and also their favorite novel is Harry Potter, odds are they don’t actually read that much fantasy.
I hate these kinds of arguments. It assumes that every book has an equally likely chance of being good. Books being popular and books being good generally have a correlation.
If you have read a lot of fantasy, there's a good chance that your favorite is a book that was popular at some time. My favorite, for example, is John Crowley's Little, Big. It's not obscure! It won critical acclaim and wide readership in 1981.
Most popular books become obscure books, given enough time. (Reading Les Miserables, I had to consult the footnotes repeatedly for Victor Hugo's allusions to then-popular novels and novelists, almost none of which remain popular today.) If you're a serious reader the chances are diminished that your favorite book is something that has just recently been written. But it's also likelier than random chance that your favorite book was once popular, because as you note there's generally a positive correlation between quality and popularity.
> It assumes that every book has an equally likely chance of being good.
No it doesn't, it just has to assume that
P(good|obscure) > - P(popular)/(P(popular)-1)
Or, more practically, when P(good|obscure) is just a hair more than P(popular)
Let's say all popular books are good, so
P(good|popular) = 1.
Then we'll say 1/1000 books are popular.
This means P(popular|good) == P(obscure|good) (i.e this is when the number of good books that are popular equals the number of obscure books that are popular.) when P(good|obscure) = 1/999. This true if we assume that P(good|popular) = 1, which is the highest value it can take. If this number is lower than this constraint is reduced, so we can take this as an upper bound of the relationship.
So knowing nothing about the rate of goodness among popular books, we can assume as that there is a huge number of obscure books, and a book is just reasonably more like to be obscure and good, than it is to be popular, the we can confirm that there are more good obscure books.
This constraint is much less demanding than assuming all books are equally likely of being good.
Both may be true at the same time: “being good may correlate with being popular” and “the more you read the more likely your favorite book is obscure”
The math is the same as hot/crazy/marry plots: draw popularity vs. goodness (random correlated dots: more popular more goodness). Define which books may be favorite e.g., above G line goodness a book has a chance to become favorite, define threshold for obscurity e.g., less than P popularity level. Consider what happens to the number of obscure books the more you read the more of them can become popular.
If you model it with a single threshold then the math is the same as for «the better at programming competitions the worse at “some other metric for coders” among google hires» (imagine you sum two metrics and hire only those who have the sum above a threshold -> you get the positive traits in reverse relationship).
I was given this to read by my boss 2 decades ago. It is excellent, thought provoking and different to anything else you've probably read.
I haven't read the third updated edition, with changes by a different author. I've heard mixed reviews for that one. I'd be interested in any opinions on it from anyone whose read it and a prior edition.
I often recommend this book to developers who are just starting to seriously use databases and learning to how to better model information - it's one of the best introductions imo. I wasn't aware of the shenanigans with the newer editions, I read the same pdf that they linked.
I strongly recommend not reading the 2012 edition. Steve took out about half the Bill's original writing and replaced it with advertisements for his own book. I wrote some stuff near the end of how you can find the 2nd edition.
> It’s just a simple numbers game: there are more obscure books than popular books, so there are more obscure good books than popular good books, so the more books you read the more likely your favorite is an obscure one.
There is an unstated dubious assumption that popularity is independent from goodness here. Not that I disagree with the conclusion.
Popularity and quality don't have to be independent for this to be true—it's enough that they're not too heavily correlated, or at least not heavily correlated past some threshold. That does not seem dubious at all!
Right now, I am wrestling with exactly the same issues: designing data structures for representing real-world entities, discovering the roles they play (in a financial decision in my case), gathering data about the entities (depending on their roles), confirming the relationships between them, and validating attributes about them, based on primary data with differing degrees of reliability.
I've been through various iterations, painfully getting better at representing "what is true and what don't we know yet".
Reading "Data and Reality" just now was like a distillation of my endeavour, all in one place. Plus lots of things I haven't properly considered yet.
I can see why many people find it dull: if you haven't struggled with representing these distinctions yourself, it can well sounds like pointless nitpicking and overcomplicating straightforward situations.
So a big thank you to the original poster, Tomte.