Hacker News new | past | comments | ask | show | jobs | submit login
NoSQL – Back to the Future or Yet Another DB Feature? (speakerdeck.com)
33 points by itcmcgrath on May 30, 2012 | hide | past | favorite | 41 comments



I hate it when speakers talk about "NoSQL" without specifying what they are actually bitching about, which in this case is clearly Key-Value/Document oriented databases. Of the non-relational databases, there are actually many which provide everything you want in a relational DB (separation of logical and physical within the DB) as well as storing the schema as PART of the data instead of a physically configured separate entity. These databases are called graph databases, and they are awesome. I am convinced they would have been the major databases of today, if not for the fact that it takes way more computing power and fancy algorithms to work with graph data models.


The real problem with graph databases is that they do not scale because the algorithms used parallelize poorly. Partitioning graphs is a longstanding challenge in computer science.

It is literally equivalent to the NoSQL "we don't do joins" algorithm problem. Unlike most NoSQL databases, the basic operation of a graph database is built on join algorithms (edge traversal is a relational join). If you figure out how to parallelize ad hoc joins on large clusters then graph databases become a viable solution for non-trivial databases.


I always cringe a bit when people point to papers[0] when it comes to software that has is supposed to have real world usage. Just because a paper described a system that handles distributed transactions well doesn't mean that this software exists or will be anywhere near usable in the next 5 years. Even if they are, only very few papers actually talk about a system that is already in production use. The few that do (bigtable, dynamo) usually are great though.

0: http://cs-www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.p...


The paper describes a system that is a library which adds distributed transactions to effectively every data store available (distributed and non-distributed). I see no reason why this technology should not be available really soon given the broad applicability.


The main reason: it's academia and they have already written the paper :)


AFAIK the scalable SQL work from the 1980s was actually shipped by Tandem.


for your reading pleasure, please take a look at this paper: http://www.hpl.hp.com/techreports/tandem/TR-85.6.pdf

what they describe there basically is the current state of affairs: different database technology for different purposes (in-memory, hierarchical and journal). I love the paper for it shows how things do actually not change.


(re that Tandem paper: it's IMS / a hierarchical DB that they describe though)


I'm not sure what you're referring to; Jim Gray's work looks pretty relational to me. http://research.microsoft.com/en-us/um/people/gray/papers/Be... is just one example.


I can't but help agreeing with this. However, I think NoSQL is an important tool when dealing with 'big' data (billions of records). NoSQL is also awesome for 'accelerating'/caching standard SQL (memcache to cache sql results, etc).

At the end of the day, both have important roles to play and, in many respects, compliment each other.


By the time a NoSQL "database" is ready for prime-time, it will have all the features of a real database.


What is "unreal" about current NoSQL Databases?


I guess you're new to Hacker News?

http://news.ycombinator.com/item?id=3202081

http://news.ycombinator.com/item?id=3982142

http://news.ycombinator.com/item?id=3954596

Real databases have real features like transactions, and guaranteed writes to disk, and not retarded security. Crazy shit like a commonly-used query language and tool support.


You do know that companies like Amazon and Google rely primarily on NoSQL databases.

If they aren't "prime time" I don't know what is.


I learned today that for data analysis they still stick with SQL (and some fancy MapReduce stuff underneath)


While the author may be making a good point, I couldn't understand it as presented. Instead of reading an interesting article, I stared at small screens of half sentences, tables, and pictures, with very little explanation. Slides without a narrator are a poor way to communicate.


Yes, several slides require more context. The talk was recorded and, hopefully, will be available online at http://www.nosql-matters.org/ soon.


I'm surprised to see no mention of postrelational/multivalue databases. Postrelational DBs played in the 80's and 90's, and are still widely used in the world of big iron.

It's historically inaccurate to state we went from file-based databases to relational ones and are now slipping back to file-based ones.

Rather, both the relational and non-relational DB worlds have taken many ideas from the post-relational world (even if not intentionally.)

The SQL/NoSQL + ORM + MVC stack of today is much closer to what postrelational DB's were like rather than what SQL apps were like back in the day. Every generation thinks they invented sex, and high level database programming...

http://en.wikipedia.org/wiki/MultiValue


You are right. In the talk I gave I briefly referred to Hierarchical Database for instance. It's not in the slides because of the timeframe for the talk.


Beat a dead horse much? This has already been slashdotted to death. I like Erik Meijer's take: it should've been named CoSQL (dual to SQL): http://bit.ly/feNxRE


A high level and theoretic NoSQL debate doesn't help. We know all the strengths and limitations of both SQL and NoSQL. And we know when to use what.

Moreover, NoSQL does not equal NoSQL—there're so huge differences between a Mongo, a Riak and a Couch. And there's much more than that beyond SQL (graph based DB etc.).

What matters: The experience.

I used MongoDB with Node one or two weeks ago for this first time and I was really impressed when I got it. There are many uses case where I wouldn't employ NoSQL or Mongo but there're as many where I would go w/ Mongo when usually relied on SQL.

Why: working completely without schemes and even migrations is awesome. Just save a record in Node/Mongo with the native interface and the system is setting up the respective table and even DB in the same moment if not existant. Schemeless is not always the way to go but if you want to prototype or to get quickly out of the door, it's an mind-blowing experience (and it's enough for many web projects). And with the JS interface it's just incredible and totally different to other NoSQL counterparts.


The biggest problem with NoSQL (whatever that may mean) is that there is no formal framework to reason about NoSQL in a uniform manner as opposed to SQL systems and bare data structures where strong formalisms exist that can give hard results on costs. IMHO, this is because NoSQL is a "conceptual bag" in which anything is thrown that doesn't have a SQL interpreter in it. That has many contradictions.

E.g. Google App Engine's GQL, Facebook's FQL and Yahoo YQL not to mention Hive all have a very SQL like syntax. So what about all this is NoSQL ? SELECT, WHERE and FROM are OK in a NoSQL database?

Dig further and you'll find that these are all "No-Join" data bases and it's the badly scaling cost of the Join operation that gave SQL a bad name and the NoSQL community a bad category name.

Bottom line there needs to be a unifying formalism to talk about NoSQL as a category. Which ("Tada Boom!") gives me a seque and a pun to refer to a paper published in the IEEE about using Category Theory to describe NoSQL.

I will critique it elsewhere, but suffice it to say that if I have to do a PhD in Pure Math to design a database it's a non-starter. (Aside I have a "PhD" in Pure Math - everything but thesis)

So anyhoo bottom line - Missing: a formal framework to reason about databases that don't use SQL or Joins.

Without that it is not even possible to decide what is the membership criterion for something to be called NoSQL, especially when NoSQL is defined as "Not Only SQL" = SQL UNION ~SQL = the whole Fing universe.

So NoSQL is a label that has no ability to make logical distinctions. And NoSQL as a grab bag of technologies has no formal way to reason about it.

To make any sense this needs to be fixed, before any arguments about this are even worth having.

[My creds: 20+ years in the DB world as developer, architect, instructor; including 4+ yrs in the NoSQL business, including one year as emp #2 and VP of BizDev at CouchOne, and including lead developer of a project that used CouchDB in an National Science Foundation funded project back in 2008/2009 (slashdotted and survived) and currently using Couch, Mongo, NodeJS in various projects]


NoSQL is trading off ACID transactions for scalability/throughput/latency. Both Calvin and Omid can add ACID transactions to any NoSQL for about nx decrease in throughput.

NoSQL is about choice. Period.


Yes, and fragmentation of your data. Resulting in e.g. data quality problems. It's not just choice, it's choice with downsides that many seem to not really care about at all.


That is simply untrue. You can have fragmentation of your data with SQL databases as well.


NoSQL is giving up FAR more than ACID. It is giving up the ability to consume the stored data in a beautifully concise and expressive manner.

edit: 'the ability to'


The consensus of NoSQL these days is Not only SQL. Nobody is disagreeing that SQL (and the underlying relational algebra) is expressive and general.

For a lot applications/use cases, schema free get/put/delete is a lot more concise than any SQL concoctions. Besides, using RDBMS/SQL to store/retrieve complex graph is neither beautiful or concise.

Just because you can use your Swiss army knife to kill a few roaches doesn't mean you should.


If my data is document based then how is SQL going to be more concise and expressive than JSON ?

Say I have a domain model that involves a primary object e.g. User with lots of maps and arrays then how is SQL going to be better then ? Lots of tables and joins better than a single document. I think not.


If your data is a User object with maps and arrays, and you design a data structure which gets you the result in exactly that format, you _are_ giving up the ability to ask other questions of this data. You are giving up the ability to ask _many_ different kinds of questions (expressive) in a relatively simple way (concise).


I don't agree with calling these databases "NoSQL", because it grants the undeserved premise that SQL is somehow the center of the database universe. Politically that may be so, but technically, it's not. Nothing anoints SQL databases as being somehow intrinsically "right" or the logical starting point or standard by which all databases should be compared. SQL is merely popular.


I don't think the premise is undeserved. SQL IS the center of the database universe. If we use SQL to mean the general concept of relational databases, then it is true across even more dimensions of the universe.

Relational databases and sql should be considered the pride and joy of computer science AND software engineering (with logic and math being the grand parents in this increasingly confused and mixed metaphor).


"SQL IS the center of the database universe."

Politically, yes. SQL/relational wins the popularity contest. But so what?

Technically, no, it's not the center. It's merely the most popular, for the time being. This is a temporary state of affairs.


I think the current confusion is due to the fact that relational databases are too often thought of as data stores. It is true that these DBs are not always the best solution. Sometimes it is better to store data in various NoSQL databases. Sometimes it is even better to store data in a flat file.

However, relational databases provide unprecedented ability to query information. Imagine one person puts data into a black box. 100 other people can ask it questions which were never intended by the original designer, and the system is able to answer those questions fairly efficiently.

The language used to query is orders of magnitude simpler than 'normal' programming languages.

If you query your database and the answer takes too long, you can add an index and the exact same query will start running more quickly!

Two different people can create two different tables. A third person can come in and 'join' these two tables.

I see nosql being a low level technology, higher than blocks/files/b-trees but lower than relational databases. Another way to say it is that relational algebra is the theoretical model while large parts of modern nosql tools are implementation details.


"Merely the most popular"? LOL. It's the only formally proven, mathematically correct representation and querying of data sets.

Not that most RDBMS conform to the relational theory 100% (or even 90%), but everything else is mere re-invention of the wheel badly.

As in: "Hey, let's trade ACID, security, uniform access to data by all apps" for cheap speed and ill-thought developer convenience.


> It's the only formally proven,

No. RDF is formally proven via its Model Theory (as was KIF before it). That's arguably a stronger basis that relational "algebra"


"LOL. It's the only formally proven, mathematically correct representation and querying of data sets."

Someone led you down the garden path.

There is absolutely no proof that RDBMS is objectively correct in any meaningful sense whatsoever. Did someone invent an arbitrary standard to measure it by, and then prove that it met that standard? Sure. But that's a far cry from a claim that RDBMS's are somehow "formally proven." That's just pure mathematical silliness and marketing propaganda.


Who said anything about RDBMS? It's relational algebra we're talking about, and the reasonings pertain to the relational algebra operators.


Not to mention SQL isn't technically a database type, just the query language most commonly used for relational DBs. Baby/bathwater and all the usual tendencies to ignore true meanings of terms when we use them...


>* Politically that may be so, but technically, it's not.*

Actually it's both technically AND scientifically.

Relational algebra is MATH describing the properties of relational data (that is, all kinds of data one would ever care to use), based on set theory.

All the other ways are just ad-hoc BS non-solutions, that happen to have this or that property because they forgot the tradeoffs.


The fact that it's based on set theory is utterly irrelevant. The question is: what is the best database from an engineering perspective? It has never been proven that RDBMS's are the best in any respect whatsoever in this sense.

It's irrelevant that you can concoct an arbitrary mathematical theory and then create a system to match. You must also prove that that theory is the most relevant and important way of looking at the data for the given problem at hand. RDBMS theory doesn't even bother trying to worry about the real world.


This math thing: you're doing it wrong. Theories are not "arbitrary", and you cannot outbest theory "in practice".

They only way to do this is if you have an ad-hoc faulty theory.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: