Use rsync to deploy your website

kylec · on Dec 21, 2009

If you're already using Git for version control, why not also deploy from it? I personally use it to simultaneously push to GitHub and production and it works very well:

http://stackoverflow.com/questions/279169/deploy-php-using-g...

dschobel · on Dec 21, 2009

ditto (but with mercurial). then you have the bonus of having your entire deployment configuration versioned for easy rollback.

Pistos2 · on Dec 22, 2009

I deploy everything with git. With branches and tags, that also lets me deploy different states of a given project to different servers (development, staging, production). Rolling back or switching between tags is easy (migrations aside).

gouki · on Dec 21, 2009

I usually have a few files that I don't want on the server, or vice versa. The option --exclude-from=ignore-files is useful to maintain a list of files not be synced.

The file ignore-files would contain the keywords or files to ignore, one per line.

qjz · on Dec 22, 2009

-a is already recursive, so -r is unnecessary. For most sites, it would also be wise to include --delete so renamed or locally deleted files would be removed on the destination.

0xbadcafebee · on Dec 22, 2009

Here are some more useful options:

* "-C" to exclude all common revision-control working directory files. Lowest filter priority, consider the rest of your args.

* Look up backup rsync guides and add snapshot or "--backup-dir" support to your deploys for a cheap alternative to system snapshots and other deployment reversion techniques.

* "-E" and "-X" for when you just want to apply execute settings and not blow away whatever custom permissions you may have on your deployed files ("--chmod" is also helpful here)

* "--delete-after" and "--delete-excluded" to clean up files you've deleted from your source files.

* "--timeout=120 --contimeout=60" Because nobody needs to wait 5 minutes to find out their deploy failed.

* "--compress-level=9"

* If you have a huge deploy tree, it may be faster (though less reliable) to run

    find SRC -mtime -8 -print0 > files.txt && rsync OPTIONS --from0 --files-from files.txt DEST

This will make a list of only the past 7 days' worth of modified files to attempt to rsync, which may cut down on the total time to deploy considerably.

* "--log-file=/big/partition/deploy.log" To get a better idea of how the deploy is going or how it went.

* Consider if you need --delay-updates to make the deploy more atomic. If files get deployed a few at a time, will this cause the user experience to suffer? Load as well as long file lists can cause long periods between updates.

* If you have more than one developer that can deploy to the site at the same time, consider that rsync should only be your file-transfer tool. You need a whole other layer to account for transaction locking, who is deploying or reverting files, and basic logging is very handy to track down problems. But that's a bit out of scope for this link :)

Vitaly · on Dec 22, 2009

I cringe every time I hear something like this called a "deployment". This is a lazy hack, not a deployment. You can't deploy with ftp/rsync/put_your_own_tool_here sync. Well, you can, kind of, but you better not.

A "proper" deployment must be (at least):

* Completely automated. Should be just a simple command line.

* Atomic. As in all the files are changed at once.

* Easily revertible.

If your are deploying a db-backed application, deployment process also must manage db versioning (including possibility of rollbacks).

All of the above is trivially provided by Vlad or Capistrano. And since they are super easy to use even for non Ruby projects, I fail to see the reason to keep using ftp or rsync other then laziness.

I've seen some people using git for deployment, which is better then ftp/rsync, but still lacks in atomicity and rollbacks can be quite tricky.

pan69 · on Dec 21, 2009

We just do subversion checkouts. Of course, we have the proper .htaccess rules set up to prevent access to the .svn folders. Subversion also allows us to commit database exports straight back into the version control system from the server.

potatolicious · on Dec 21, 2009

I personally do not feel secure at all with production boxen having the ability to push back into repo.

I use svn export for deployments - all sorted by revision into static cached directories, then symlinked to HTTP root.

This has the added advantage of being able to roll back to any previous version instantly.

Legion · on Dec 22, 2009

That's clever and I feel stupid for not having come up with it myself (the keeping separate revisions and symlinking part). Thanks for sharing.

sjs · on Dec 22, 2009

Don't feel stupid. Not everyone thinks of everything.

pan69 · on Dec 22, 2009

For the server to commit we create a special user with limited access to a single branch.

cominatchu · on Dec 22, 2009

takes too long if your project is of reasonable size, why not just send the diff

potatolicious · on Dec 22, 2009

We're talking about deployment to production machines - what's the difference between 3 seconds and 5 minutes? Your machine is sitting on a fat pipe in a datacenter, where you can push a 300MB deploy (which is a LOT) trivially. I highly doubt you're rolling out code changes every 10 minutes... not to production anyway.

Sending diffs removes your ability to roll back at all, or if you're somewhat devious about it, it still wouldn't allow you to roll back more than a single version.

I'm surprised you're so concerned about time-to-deploy for an action that happens, what, once a day at most? Twice? The amount of safeguard you gain for the trivial non-human-time (it's not as if some guy is sitting there copying files) increase is pretty massive.

cominatchu · on Dec 24, 2009

As other people mentioned, if you send the diff by, say, pulling from your git repository, then you can roll back to any tag or revision.

What happens when your 300MB deploy needs to be pushed from your development machine out to 15 different production servers? I wouldn't want to send ~4.3GB for every deployment. Furthermore, a quick deployment time is valuable when a crucial fix needs to be pushed to production.

This is not to mention the space that your 300MB deploy is going to take up on the hard drives of your production machines if you deploy twice a day. That's over 213GB a year per machine, sounds like you would need to start deleting old revisions. Or start just sending the diffs.

prodigal_erik · on Dec 22, 2009

We've been putting our code in rpms and it works well. Every box knows exactly what's deployed right now (along with dependencies and whether anything is tweaked), local edits to config files can be preserved, and "ssh yum install" gets a box from clean to production ready in under a minute.

0xbadcafebee · on Dec 22, 2009

Wait - preserve local edits? That sounds bad.

It is a handy tool for dealing with versioning and rollback. But i'd hate to be the guy who deploys a hot fix and suddenly rpm is out of locker entries.

prodigal_erik · on Dec 22, 2009

Renaming local edits out of the way or preserving them is up to you, depending on whether you put %config or %config(noreplace) in the spec. It does need to be handled carefully, but it can be handy if you want one machine giving special treatment to a representative sample of your requests or something.

Xichekolas · on Dec 22, 2009

I use this little bash script (as a daily cron job) to selectively mirror directories on two hard drives in case I lose one (very poor man's backup solution). I imagine it could be used for deploying code (modified for remote machine of course):

  #!/bin/sh

  RSYNC="/usr/bin/rsync"        # Verify with 'which rsync'
  DIRS="/home/xich /etc"   # Directories to be backed up.
  TARGET="/mnt/secondhd/backup" # Directory into which all backups are placed.

  # For instance, if TARGET is /mnt/secondhd/backup and DIRS is "/home/user1 /home/user2"
  # then after running, there will exist a /mnt/secondhd/backup/user1 and
  # /mnt/secondhd/backup/user2. DO NOT use trailing slashes for any of these paths,
  # as that will change the behavior of rsync.

  LOGFILE="/var/log/mirror_hds.log" # For errors only.

  for dir in $DIRS; do
  	INEX=""
  	if [[ -e "$dir/.mirror_include" ]]; then
  		INEX="--include-from $dir/.mirror_include"
  	fi

  	if [[ -e "$dir/.mirror_exclude" ]]; then
  		INEX="$INEX --exclude-from $dir/.mirror_exclude"
  	fi
  	$RSYNC -av --delete $INEX $dir $TARGET &> $LOGFILE
  done

The .mirror_include and .mirror_exclude files are just newline-delimited lists of file masks (they do what you would expect). I did it like this so each user can modify his own exclusion/inclusion lists (the file belongs to the user), and doesn't have to mess with the cron script (which belongs to root). Inclusion takes precedence. As an example, my exclude file:

  .* # any file that starts with a period
  tv # my tv shows directory
  movies

And my include file:

  .mirror_include # since this would be filtered out above
  .mirror_exclude
  .conkyrc

noonespecial · on Dec 22, 2009

Recommend -e ssh with keys as well. Safer, no rsync port hanging out.

pingswept · on Dec 22, 2009

A good point, but I believe that most recent versions (since around 2004, I think) of rsync now use ssh by default, so the -e flag is not needed to use ssh.

noonespecial · on Dec 22, 2009

Ouch, my age is showing. Again.

bcl · on Dec 22, 2009

rsync uses ssh by default, no need for -e anymore unless you need to specify a key or port or blowfish (which you can all do with ~/.ssh/config)

javert · on Dec 21, 2009

I do this too, and it's very convenient. Especially if you have multiple websites.

I actually put the rsnyc command in a Makefile so I can update by typing `make`.

I have a folder in my ~ directory which contains a folder for each of the remote machines I work with. I can update any remote machine by going into its folder and typing `make`.

steveklabnik · on Dec 21, 2009

I dunno, this is fine for small projects, but pretty soon, something like Capistrano works much, much better.

mcantor · on Dec 22, 2009

Could you perhaps explain why?

mey · on Dec 22, 2009

Automating rollbacks, re-huping associated services/servers, rolling out database changes, or coordinating across a cluster of servers.

webology · on Dec 22, 2009

We recently changed our deployment model from an all Capistrano model to a Capistrano plus rsync model. Our old model would take 20 to 45 minutes to update every server in our farm. Our new model updates one staging / deployment server via Capistrano then rsyncs to each of our production nodes. This process normally takes less then a minute and spikes to a few minutes for really large changes. We still have the ability to rollback code and this model is actually quicker at fixing bad deployments then our old model was.

A majority of our time before was spent updating both our codebase then checking external dependencies on each server. Rsync made this process much quicker since these checks are completed once then pushed to each server.

Nycto · on Dec 22, 2009

rsync works really well. It's what we use at our company. One trick we have considered implementing is pointing the HTML root to a symlink that points to the current release. When you rsync, create a new directory named after the version. After you have verified the transfer, flip over the symlink to point to the new code base. Doing it like this will make quick roll backs easy and protect you from interrupted transfers or users hitting your site mid upload (it has happened... we spotted an anomaly in the error logs and our "wtf"s per minute shot through the roof).

DrJokepu · on Dec 22, 2009

I love rsync, use it for deployment all the time. However, it is quite a pain to set it up for Windows. All the Windows rsync ports I know of run on top of Cygwin and Cygwin changes the permissions of the files to the Windows equivalent of 000 all the time. It is possible to get around this but not entirely straightforward. I'm seriously considering writing a step-by-step quick guide for setting up an rsync daemon on Windows as it is not trivial at all and it might save some time for other people.

riobard · on Dec 22, 2009

My solution using git:

    #!/bin/bash
    # run in the project root 
    git add -A   # track added files
    git commit   # local commit
    git push     # push to remote bare repository
    ssh REMOTE_HOST 'cd PROJECT_FOLDER; git pull'  # expand a working folder from the remote bare repository; usually public www folder is contained there

extra benefits: two copies of complete history for backup local and remote

cloudkj · on Dec 22, 2009

How do you ensure that the .git directory isn't accessible?

Pistos2 · on Dec 22, 2009

In my case, it's never part of the publicly-served tree. public/ is a subdir within the repo.

riobard · on Dec 23, 2009

I have a public www folder too if I only want to open up part of the whole repo. But since there is usually no sensitive data in a public-facing repo anyway, I don't care if people can access .git or not. They may clone it if they want! :D

riobard · on Dec 22, 2009

no safeguard whatsoever. i'm the only user so ... :)

njharman · on Dec 22, 2009

I use this for simple, single server, small codebase sites.

But for various reasons (mainly rollback, history and consistency(rsync takes time, time in which your site has files from different versions) I much prefer the "schlep entire codebase into new directory and then switch symlinks when your ready to go live". Such as Fabric, Capistrano, or your custom deploy scripts do.

"schlep" is the technical term for scp/rsync/checkout/etc.

cmelbye · on Dec 22, 2009

Wikipedia (and its sister projects) uses rsync to deploy new code updates and configuration changes to the production cluster.

dangrossman · on Dec 22, 2009

Springloops (http://www.springloops.com) hosts my Subversion repositories and can also be set up to deploy each repository to a set of servers, either manually or automatically upon each commit.

garnet7 · on Dec 22, 2009

In the article, the author seems to be switching their use of trailing slashes. At the top, (paraphrasing) it's

    rsync -arvuz /src/foo /dest/foo/

but at the bottom it's

    rsync -arvuz /src/foo/ /dest/foo

Which is correct?

mark_h · on Dec 21, 2009

I've been doing this using fabric, which provides the rsync_project built-in: http://docs.fabfile.org/0.9.0/api/contrib/project.html

idebug · on Dec 22, 2009

is that a bit like puppet?

anamax · on Dec 22, 2009

How robust is rsync? How does it report failures? (Is that one line inside something that tells a human that something went wrong?)

What about partial successes?

_zhqs · on Dec 22, 2009

We use rsync to deploy all our front-end code/files.

It is pretty robust, but, syncing multiple files is not an atomic operation across the batch as a whole. So if one file fails you do get an err report. But the other files would have been synced. Once you fix whatever the problem is and you re-run your script, that one file will then be synced.

In practice, rsync is really great for deploying. It is also neat for backups - in fact, before I got my Mac and started using Time Machine, I used rsync for backups (and in fact used rsync.net for offsite backups).

0xbadcafebee · on Dec 22, 2009

The man page and docs in the source detail how it handles these different circumstances. The short answer is "very robust" and there are options to customize how it behaves during partial or total success/failure.

terrellm · on Dec 22, 2009

I use a Rakefile with a deploy task that uses rsync to send the HTML to our server and s3sync to send our images to Amazon Cloudfront.

nick007 · on Dec 21, 2009

nice tip... love it