Chartio develops non-blocking MySQL queries using Tornado

wulczer · on June 10, 2011

A quick search reveals this post:

http://jan.kneschke.de/2008/9/9/async-mysql-queries-with-c-a...

with a little more in-depth exploration of the MySQL async API. The problem seems to be that that (undocumented) API does not handle EAGAIN and that there's no way to connect asynchronously (mysql_connnect always blocks).

Compare with PostgreSQL, which has had an async API for a long time:

http://developer.postgresql.org/pgdocs/postgres/libpq-async....

and for which I wrote a Twisted library that does fully asynchronous connection building and query execution: https://github.com/wulczer/txpostgres.

Since the asynchronous API is exposed in psycopg2, the Python connector library, it should be trivial to hook it up to Tornado as well.

piotrSikora · on June 10, 2011

It's already done: https://github.com/FSX/momoko

tptacek · on June 10, 2011

I swagged this out a couple years ago on a lark for Ruby/EventMachine:

https://github.com/tqbf/asymy

I have a much more mature version internally, and my understanding is that you can get gem-ified libraries that do async database stuff for EventMachine today.

I lost interest in this pretty quickly once I added Redis to my stack. Redis is trivial to talk to asynchronously, and by sticking a queue in between your async components and your database layer, it becomes easy to implement the half-async model --- plus you get very smart caching and another layer of indexing "for free".

ivank · on June 10, 2011

For Twisted, there's https://github.com/hybridlogic/txMySQL, or the standard adbapi, which runs queries in a thread pool.

Also, some Twisted documentation was improved after Tornado was released; there's now "Twisted.Web in 60 seconds": http://twistedmatrix.com/documents/11.0.0/web/howto/web-in-6...

jamesaguilar · on June 10, 2011

> While this isn’t a big deal in nearly all production environments (you want to know when your database gets slow, so you can optimize queries or add indexing), it matters a lot to us because a slow query can affect other customers queries.

One more reason to be leery of single-threaded eventing systems. You'd never run into this issue with a threaded web app, and it would perform just as well provided you kept your datastructures as independent as they are in your current eventing setup.

Comparing an eventing system and a threading system, the eventing system is inherently less shared-nothing. It shares everything the threading system does and the event thread.

tptacek · on June 10, 2011

You appear only to have read the first graf of this post, as the whole point of eventing the database access layer is to eliminate the "slow query blocks" problem.

But that's not why I'm commenting. Rather:

You're able to make that last assertion only by shifting the meaning of the word "shared" and denuding it of all its concurrency implications. Yes, event systems "share" the event loop, in all the glory of the word "shared". However, no two contexts in an evented system ever step on each other for access to a shared resource.

jamesaguilar · on June 10, 2011

> You appear only to have read the first graf of this post

I read a little more than that, but I commented on what was interesting to me. If I'm reading the rest of it right, it's basically a tutorial on making a python extension for two specific mysql API functions. That's fine, but it's not that interesting (to me).

> However, no two contexts in an evented system ever step on each other for access to a shared resource.

Isn't that exactly what is happening when other requests are blocked by a blocking mysql call? They are stepping on each other for access to the shared event thread resource, which they need concurrent access to. Is this not the case? Please help me understand if I am misreading you.

IgorPartola · on June 10, 2011

Also, making everything non-blocking will not necessarily make the user experience better. It will just allow you to have more users concurrently, all waiting for your slow queries. In other words, user A no longer has to wait on users B, C, and D, but still has to wait on his/her own slow query. Depending on your application usage patterns, this could be enough to go from unusable to awesome. Or it could be that your just need to add a ton of indecies.

thingsilearned · on June 10, 2011

Just to re-iterate Justin's warnings: while the thought of making everything non-blocking is exciting, this probably should only be used if you're expecting (and can't do anything about it) very slow queries.

tptacek · on June 10, 2011

Really? I've never scaled up a consumer web app this way, but is it really considered sound to insert blocking database calls in async web apps? This sounds wrong to me.

BarkMore · on June 11, 2011

It works if the service runs a sufficient number of application instances behind a load balancer. In one of the videos about Tornado, a person from FriendFeed stated that FriendFeed runs multiple instances of the application with blocking calls to MySQL. There were many complaints about FriendFeed, but being slow was not one of them. (See slide 21 here http://bret.appspot.com/entry/tornado-tech-talk).

johnm · on June 10, 2011

At scales where it matters, you're correct that it's really wrong.

The rational hope is that by the time you get to that scale you have people/code that know what they're doing and can fix it. Alas, there's many cases where that's not been true.

IgorPartola · on June 10, 2011

Why wouldn't you use it for normal operation? It would not make the user's experience better, but it would certainly increase the capacity of your service.

cgbystrom · on June 10, 2011

In search of performance we also developed a non-blocking MySQL driver for Python (and gevent). It's used by our real-time web framework, Planet (http://www.planetframework.com).

While we could've gotten better performance with more processes, we wanted to stick with one process per core. Given this and a non-blocking driver we naturally got a good performance gain.

amysql, as it's called, is up on GitHub: https://github.com/esnme/amysql

EvanMiller · on June 10, 2011

This should be called "not-as-blocking" MySQL queries. As they say in the blog post, the calls to send and read still block, but they've made it so you don't call read until there is at least something to read. I imagine this will be an improvement over many setups.

To add to the list of truly non-blocking MySQL libraries, check out Erlang's MySQL driver: https://github.com/Eonblast/Emysql

IgorPartola · on June 10, 2011

This is great. I was going to write my own lib that would use a whole bunch of threads and a non-blocking queue system for executing queries. While I find Tornado somewhat half-baked (it defaults to returning errors in HTML, but was built for handling feeds; its content-encoding header handling is broken, etc), there are lots of applications where non-blocking MySQL would be great.

thingsilearned · on June 10, 2011

The title is a little misleading. The method here certainly isn't dependent on Tornado, but for explanation/demo purposes it's a common context to explain it in.

IgorPartola · on June 10, 2011

Yes, I realize. This would work for anything in Python, and possibly other things as well. node.js could ditch thread pool as well.

irrelative · on June 11, 2011

Indeed it is not dependent on tornado, but it does use the tornado event loop in this example.

gaius · on June 10, 2011

Interesting, Oracle is currently recommending not using async OCI calls in the 11g documentation (altho' the feature is obviously there), instead they recommend you to use blocking calls on separate threads. This means you can get notified when your query completes on the client with no network roundtrip to find out if it's still executing.

kevingadd · on June 11, 2011

There's no reason completion notifications can't occur in an async API. It sounds like oracle calls polling "asynchronous", which is not particularly accurate. If the polling operations also involve network round trips to check for query completion, that is a tremendously stupid API.

gaius · on June 11, 2011

Reason's simple: you have to support in your 11g server clients written in the dim and distant past, so you can't monkey with the protocol too much. You could do this as you suggest by subscribing for DCN events when your transaction completes, but obviously not with SELECTs.

jacques_chester · on June 11, 2011

Oracle is a funny beast. Like other gold-tooth-money databases it has some amazingly advanced features. But in other ways it is more 1980s than Top Gun crossed with Duran Duran.

It still doesn't have a native boolean type in SQL, for example. After 30 years.

DonnyV · on June 10, 2011

With so many database options why would you use MySql?

IgorPartola · on June 10, 2011

Some reasons:

* It's already in place

* Everyone on your team only knows MySQL

* That's what you know and you are under a deadline

* It's the right tool for the job (as opposed to something very different, from the NoSQL world)

* You are using MySQL Cluster instead of MySQL, but don't have the cash to upgrade to Oracle

Luyt · on June 11, 2011

"It's the right tool for the job (as opposed to something very different, from the NoSQL world)"

If I may toot my own horn, I just upgraded the layout of http://mongodb-is-web-scale.com/ so that you can scroll while the video stays visible.

invisible · on June 11, 2011

Believe it or not, sometimes mongodb is the perfect tool for the job. It actually does support safe wrriting features, so some of the hype in that video is false. MySQL is always a great piece of the stack but it's not always good.

Luyt · on June 11, 2011

The video also shows its age, MongoDB hasn't stood still in the meantime. I put up a transcript of this movie because I found the mingling of the technical discussion with contemplating working on a horrible farm an interesting concept.

thingsilearned · on June 10, 2011

In all of our surveys (of web companies) the majority of people use MySQL. That certainly doesn't mean that its the best choice but using the most common software does come with a lot of obvious benefits.

http://news.ycombinator.com/item?id=1411937

fauigerzigerk · on June 11, 2011

It has a lot of weaknesses compared to postgres but it also has some definite advantages:

Clustered and covering indexes. If the data needed by a query is in an index there is no need to retrieve it from the table itself. This leads to better memory usage and cache locality.

MySQL has non transactional tables which use a lot less memory. If you store a large number of small records (say three ints or something) that table will use about a third of the memory a similar postgres (or Oracle) table would.

jbooth · on June 10, 2011

Why would you use anything else?

Mysql:

- Never crashes

- Has more tested/reliable replication than anything else, possibly including Oracle (haven't tried the wacky 3rd party replication things like Goldengate or whatever, and don't believe in spending 6 figures on software licenses in general).

- Supports SELECT, INSERT and UPDATE

Why would I get anything else for a database? For more complicated stuff I have another approach that I employ called "programming".

CoffeeDregs · on June 11, 2011

We're a bit off-topic, but... Transactions on DDL, a lack of tons of gotchas, good replication, partial indexes, document orientation and querying, a pretty query language, ripping speed, horizontal scalability, a history of good architecture, consistent hashing, etc are reasons to select something other than MySQL.

revorad · on June 10, 2011

chart.io connects to their customers' databases, which include MySQL among others.