Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Of course, I don't mean storing literal XML, but I think it's important to support generic markup in databases. It's not that hard, after all; something like these five tables:

    Markup (markupId, string)
    Ns (nsId, uri)
    QName (qNameId, nsId, localName)
    Elt (markupId, eltId, qNameId, 
      start, length /* string indices */, 
      level, index /* or other way to represent hierarchy */ )
    Attr (eltId, qNameId, value)
would be enough to store and fully index any XML out there.

I think markup is an important separate data type. We have records, we have scalars, now we have a hybrid type, markup: a string (or maybe a byte/bit stream) + records linked to indexes in this string/stream and maybe forming a hierarchy.



A database should store the actual data, not some serialized format. You don't store the comma tokens from a CSV file in the databases either - you store data which can be serialized to CSV. The same information could be serialized as CSV, XML or JSON depending on the context.

Storing a serialized format as an opaque string or blob might make sense in particular cases.


A) the proposal is crazy. You probably don’t need to know if someone has a patronymn or a kunya or whatever. Base what you store on what you need.

B) Storing this as normal data would be an intense pain in the ass to use, for no gain in performance or correctness. Use JSONB, this is what it’s for.


If you actually need to know such details then it should be stored in normalized form.

Say you have a visa database which must be compatible with multiple naming systems from different countries and where no information can be discarded for security or legal reasons. Then you store the data you need in normalized form. Why would you use weird hacks for essential information?

And if you don't need all that detail - then you just save the name as a string and discard the metadata.

I can't see it would ever make sense to store a name in JSON format inside a relational database. Either the information is important or it isn't.


A) In many cases you don't, but if you store data about authors in a bibliographic database then you probably do. And it's just a part of the package: for example, Donald Knuth went as far as to print people names in their native language in addition to English in his books' indices :)


This comment is culturally insensitive. You are implicitly assuming that your culture's way of expressing names is all that matters


Matters for what? A database is a tool to solve a problem, not a timeless repository of deep cultural meaning.

If I make a webapp that is only localized in English, display, first, last will let me solve my common problems: I need to say "My account (John Smith)" in the corner (display name), and I want to be able to write emails that say "John, we know you're wondering how company that sold you a pair of pants three years ago is feeling about COVID-19…" (first name) and list articles by author sorted by last name. Those are all common use cases for apps used by English speakers.

Now, if I were making an app to do voting in Myanmar, I would need to deal with there not being last names for many Burmese. If I was trying to track Arab speaking terrorists around the world, I'd want a long list of their aliases and kunya in multiple romanizations (I think the CIA used to prefer "UBL" for Osama bin Laden because they called him "Usama"). If I was making a library app, I might want to have the English romanized name plus the Unicode native language name. Tons of possible problem spaces with different solutions.

I just think tagging the different name parts for "Gabriel José de la Concordia García Márquez" is going to be overkill for most English language apps because when are you going to look someone up by maternal family name versus of just doing a full text search for "Garcia Marquez".


Missing the point. If you just have a single "name" field then you can accommodate any culture perfectly.


This fits the 75% use case, but misses things that do come up, like alphabetizing by last name and emails with casual forms of address.


I'm not arguing about storing serialized format; instead I suggest to parse it and end up with something like that:

    NAME ( 1, "Fyodor Mikhailovich Dostoyevsky" )
    TAG ( FIRSTNAME, 1, 0, 5 ) 
    TAG ( LASTNAME, 1, 20, 30 ) 
    TAG ( PATRONYMIC, 1, 7, 18 )
I'm not saying it's efficient (UPD: speedy), but it's sufficient to get the required details and can be mechanically transformed into a more efficient form. UPD: And it's very flexible and easy to extend. I believe it can handle just about any kind of naming from "Falsehoods about names".


But why include all the cruft fist place? You could just save this information (in normalized form):

  FIRSTNAME: Fyodor
  LASTNAME: Mikhailovich
  PATRONYMIC: Dostoyevsky


This one lacks ordering. It's not a problem for Russian names as the rules are fixed, but may be a problem for other ones.

And how would it scale? E.g. John Ronald Reuel Tolkien and George Herbert Walker Bush have two middle names, while full name of García Márquez is Gabriel José de la Concordia García Márquez. Markup handles this uniformly creating only as few metadata records as necessary, but a field-based model will need fields for all of this that will stay empty most of the time and likely some meta fields to set additional quirky flags.


Regardless of how complex naming schemes you need to support, you just make it more complex and harder to query by storing tag names and character indexes and whatnot.


Please don't do that. I hope you haven't implemented such a set of tables in any database.


Not yet :) But I may just for the sake of it.


That's a good idea. You'll learn firsthand why it's a bad idea.


Isn’t that what RDF is all about? Each field being a named property of a resource, the properties themselves being properties of other resources describing the structure of the first properties. And so on, going on as meta as necessary




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: