with a little more in-depth exploration of the MySQL async API. The problem seems to be that that (undocumented) API does not handle EAGAIN and that there's no way to connect asynchronously (mysql_connnect always blocks).
Compare with PostgreSQL, which has had an async API for a long time:
I have a much more mature version internally, and my understanding is that you can get gem-ified libraries that do async database stuff for EventMachine today.
I lost interest in this pretty quickly once I added Redis to my stack. Redis is trivial to talk to asynchronously, and by sticking a queue in between your async components and your database layer, it becomes easy to implement the half-async model --- plus you get very smart caching and another layer of indexing "for free".
> While this isn’t a big deal in nearly all production environments (you want to know when your database gets slow, so you can optimize queries or add indexing), it matters a lot to us because a slow query can affect other customers queries.
One more reason to be leery of single-threaded eventing systems. You'd never run into this issue with a threaded web app, and it would perform just as well provided you kept your datastructures as independent as they are in your current eventing setup.
Comparing an eventing system and a threading system, the eventing system is inherently less shared-nothing. It shares everything the threading system does and the event thread.
You appear only to have read the first graf of this post, as the whole point of eventing the database access layer is to eliminate the "slow query blocks" problem.
But that's not why I'm commenting. Rather:
You're able to make that last assertion only by shifting the meaning of the word "shared" and denuding it of all its concurrency implications. Yes, event systems "share" the event loop, in all the glory of the word "shared". However, no two contexts in an evented system ever step on each other for access to a shared resource.
> You appear only to have read the first graf of this post
I read a little more than that, but I commented on what was interesting to me. If I'm reading the rest of it right, it's basically a tutorial on making a python extension for two specific mysql API functions. That's fine, but it's not that interesting (to me).
> However, no two contexts in an evented system ever step on each other for access to a shared resource.
Isn't that exactly what is happening when other requests are blocked by a blocking mysql call? They are stepping on each other for access to the shared event thread resource, which they need concurrent access to. Is this not the case? Please help me understand if I am misreading you.
Also, making everything non-blocking will not necessarily make the user experience better. It will just allow you to have more users concurrently, all waiting for your slow queries. In other words, user A no longer has to wait on users B, C, and D, but still has to wait on his/her own slow query. Depending on your application usage patterns, this could be enough to go from unusable to awesome. Or it could be that your just need to add a ton of indecies.
Just to re-iterate Justin's warnings: while the thought of making everything non-blocking is exciting, this probably should only be used if you're expecting (and can't do anything about it) very slow queries.
Really? I've never scaled up a consumer web app this way, but is it really considered sound to insert blocking database calls in async web apps? This sounds wrong to me.
It works if the service runs a sufficient number of application instances behind a load balancer. In one of the videos about Tornado, a person from FriendFeed stated that FriendFeed runs multiple instances of the application with blocking calls to MySQL. There were many complaints about FriendFeed, but being slow was not one of them. (See slide 21 here http://bret.appspot.com/entry/tornado-tech-talk).
At scales where it matters, you're correct that it's really wrong.
The rational hope is that by the time you get to that scale you have people/code that know what they're doing and can fix it. Alas, there's many cases where that's not been true.
Why wouldn't you use it for normal operation? It would not make the user's experience better, but it would certainly increase the capacity of your service.
In search of performance we also developed a non-blocking MySQL driver for Python (and gevent). It's used by our real-time web framework, Planet (http://www.planetframework.com).
While we could've gotten better performance with more processes, we wanted to stick with one process per core. Given this and a non-blocking driver we naturally got a good performance gain.
This should be called "not-as-blocking" MySQL queries. As they say in the blog post, the calls to send and read still block, but they've made it so you don't call read until there is at least something to read. I imagine this will be an improvement over many setups.
This is great. I was going to write my own lib that would use a whole bunch of threads and a non-blocking queue system for executing queries. While I find Tornado somewhat half-baked (it defaults to returning errors in HTML, but was built for handling feeds; its content-encoding header handling is broken, etc), there are lots of applications where non-blocking MySQL would be great.
The title is a little misleading. The method here certainly isn't dependent on Tornado, but for explanation/demo purposes it's a common context to explain it in.
Interesting, Oracle is currently recommending not using async OCI calls in the 11g documentation (altho' the feature is obviously there), instead they recommend you to use blocking calls on separate threads. This means you can get notified when your query completes on the client with no network roundtrip to find out if it's still executing.
There's no reason completion notifications can't occur in an async API. It sounds like oracle calls polling "asynchronous", which is not particularly accurate. If the polling operations also involve network round trips to check for query completion, that is a tremendously stupid API.
Reason's simple: you have to support in your 11g server clients written in the dim and distant past, so you can't monkey with the protocol too much. You could do this as you suggest by subscribing for DCN events when your transaction completes, but obviously not with SELECTs.
Oracle is a funny beast. Like other gold-tooth-money databases it has some amazingly advanced features. But in other ways it is more 1980s than Top Gun crossed with Duran Duran.
It still doesn't have a native boolean type in SQL, for example. After 30 years.
Believe it or not, sometimes mongodb is the perfect tool for the job. It actually does support safe wrriting features, so some of the hype in that video is false. MySQL is always a great piece of the stack but it's not always good.
The video also shows its age, MongoDB hasn't stood still in the meantime. I put up a transcript of this movie because I found the mingling of the technical discussion with contemplating working on a horrible farm an interesting concept.
In all of our surveys (of web companies) the majority of people use MySQL. That certainly doesn't mean that its the best choice but using the most common software does come with a lot of obvious benefits.
It has a lot of weaknesses compared to postgres but it also has some definite advantages:
Clustered and covering indexes. If the data needed by a query is in an index there is no need to retrieve it from the table itself. This leads to better memory usage and cache locality.
MySQL has non transactional tables which use a lot less memory. If you store a large number of small records (say three ints or something) that table will use about a third of the memory a similar postgres (or Oracle) table would.
- Has more tested/reliable replication than anything else, possibly including Oracle (haven't tried the wacky 3rd party replication things like Goldengate or whatever, and don't believe in spending 6 figures on software licenses in general).
- Supports SELECT, INSERT and UPDATE
Why would I get anything else for a database? For more complicated stuff I have another approach that I employ called "programming".
We're a bit off-topic, but... Transactions on DDL, a lack of tons of gotchas, good replication, partial indexes, document orientation and querying, a pretty query language, ripping speed, horizontal scalability, a history of good architecture, consistent hashing, etc are reasons to select something other than MySQL.
http://jan.kneschke.de/2008/9/9/async-mysql-queries-with-c-a...
with a little more in-depth exploration of the MySQL async API. The problem seems to be that that (undocumented) API does not handle EAGAIN and that there's no way to connect asynchronously (mysql_connnect always blocks).
Compare with PostgreSQL, which has had an async API for a long time:
http://developer.postgresql.org/pgdocs/postgres/libpq-async....
and for which I wrote a Twisted library that does fully asynchronous connection building and query execution: https://github.com/wulczer/txpostgres.
Since the asynchronous API is exposed in psycopg2, the Python connector library, it should be trivial to hook it up to Tornado as well.