More

mattyb · on Oct 2, 2012

The recently released 'PostgreSQL: Up and Running' may be right up your alley, at only 168 pages:

http://shop.oreilly.com/product/0636920025061.do

mattyb · on Sept 28, 2012

We've seen the same behavior, so we do the draining via HAProxy, which does the right thing.

mattyb · on Sept 28, 2012

I mentioned this in a thread the other day; I've yet to see a good use case for hot code reloading. Can you really not drain requests to that host via HAProxy (or similar), and then actually restart the service? The nice thing about that approach is that your choice of service runtime doesn't matter.

TimothyFitz · on Sept 28, 2012

Your method will stall responses for server shutdown + server startup time, which for Ruby/Python apps is usually measured in tens of seconds, and for other web servers can be much worse. Hot code reloading lets you avoid any downtime at all, and with it usually being built into the framework/language specific server you get the functionality "for free".

Zdd (the project I linked to) is all about spawning a new process in parallel. All the advantages of your approach (switch to an entirely different language? Who cares) but without the stalls.

Zdd also lets you keep the old process alive through the duration of the deploy (and after), and with a little work could let you switch back in the event of a bad deploy without having to start the old version up again.

jonhohle · on Sept 29, 2012

The method he's describing requires no downtime or stalled requests. Connections are drained from some pool in a load balance set if servers. The services is restarted with the new code. The hosts are given traffic again once the are initialized and healthy.

The advantages to this method include being completely platform agnostic, as well as giving you a window to verify successful update without worrying about production traffic.

TimothyFitz · on Sept 29, 2012

Thanks for the clarification. The downsides to that approach are that you need multiple machines, and the duration of your deploys is much longer. Not to mention, you'd have to script a deploy process across multiple machines (which is not easy, in the way that "SIGHUP Gunicorn" is easy).

Personally I've found the "put new instances into a load balancer" method to make more sense for system changes (packages, kernels, OS versions) where deploying the change is inherently slow or expensive, but the method doesn't make sense for code deploys where deploy time is important.

mattyb · on Sept 27, 2012

Any idea if HBase can work with this? Looks like there are patches for hadoop-core, a quick skim didn't reveal if HBase changes are needed.

mattyb · on Sept 25, 2012

I've never been interested in hot code-swapping; I find it's sufficient to just take that resource out of the pool, give it the new goodies and then boot it up again. I know the Erlang runtime can do this, but I've yet to find a good use case.

As far as std{err,out} redirection, does Node.js have no Unicorn equivalent?

Daemonization (and respawning) should be handled by a process supervisor, like supervisord/Upstart/Monit or whatever. They can do more than just 'this process died, restart it.'

Also, most of this logging code didn't need to be written:

https://github.com/indabamusic/naught/blob/master/src/log.co

Just use logrotate.

AndyKelley · on Sept 25, 2012

I apologize - "hot code swapping" may be a misnomer. Naught does the process that you describe about taking the resource out of the pool, giving it the new goodies, and booting it up again.

The native cluster API is the Unicorn equivalent in the node.js world. Naught is essentially unicorn for node. And naught is the equivalent of a process supervisor.

Fair point about logrotate. Here's a counter one: The server that you describe requires 4 moving parts: 1. the app code, 2. the unicorn equivalent, 3. a process supervisor, 4. logrotate in a cron job. An app deployed with naught requires 2: 1. the app code, 2. naught.

mattyb · on Sept 25, 2012

By 'taking the resource out of the pool', I had HAProxy in mind, not the service itself.

And naught is the equivalent of a process supervisor.

No, it's not. Can Naught do this?

http://mmonit.com/monit/documentation/monit.html#service_tes...

That's legit process supervision. Your process may still be running, but be wedged somehow. Monit can perform remediation based on signals other than process status. Naught can't (and shouldn't) do this.

The server that you describe requires 4 moving parts:

True, but our servers are running all of this anyway; you should use the best tool for the job. Instead of rewriting a fraction of logrotate, you should read `man 8 logrotate` and move on to writing code that will help your business. When you deploy an app written using another runtime, are you going to rewrite 10% of logrotate in that language too?

AndyKelley · on Sept 25, 2012

Fair points, all. Thanks for the information.

mattyb · on Sept 1, 2012

mattyb · on Aug 4, 2012

That is a good paper. Thanks for the pointer!

mattyb · on May 31, 2012

You don't use this over collectd; you can use BatsD with it. These tools are complementary.

collectd is very good at collecting system-level metrics and sending them somewhere. Batsd is a system for receiving those metrics and storing them.

(Note that collectd has a graphite plugin, but not one for (B|St)atsD. You'd need to write a proxy.)

mattyb · on May 31, 2012

While graphite at least tried to mimic the RRDTool file-format 37signals just skips over that whole "complicated binary-stuff" and writes the data as newline-delimited ascii-text...

What benefit lies in trying to mimic RRDTool's file format?

moe · on May 31, 2012

Scalability.

mattyb · on May 30, 2012

The Etsy blog post that introduced StatsD is helpful:

http://codeascraft.etsy.com/2011/02/15/measure-anything-meas...