I am a co-founder of a small SaaS company. We recently decided to make the investment of upgrading our infrastructure setup process from "Hey David go do that" to being 100% chef-based.
I managed the process, and consider it a success. However, here are some points I would make:
1. It took a long time. Let's be generous and say it took 2-3 man-months of time to set up 4-5 different projects and roles. This was probably 10-20x what it would've taken to set up the servers directly. Why? Learning curve with chef for both our programmer and sysadmin. Figuring out how to make config changes automatable and idempotent.
2. The scale you get from chef is bigger than managing production infrastructure. We now use chef for not only production deployment, but also dev. Once paired with Vagrant, we are able to get new devs up with a complete stack in about 10m of keyboard time. If we need to upgrade to some new version of something, only one person has to deal with the sysadmin; everyone else can just update their box.
3. I think it will save money in the long-term. A good sysadmin is $100/hr+. Unfortunately you have to pay that rate whether they're doing architecture, security review, or just editing text files. With chef, a non-sysadmin resource can generate recipes with just architectural advice and review from a sysadmin. This is much more efficient, especially for small shops where a sysadmin is an expensive and not immediately available resource.
We've been in business for about 8 years and use about ~100 sysadmin hours a year.
I interviewed several sysadmins over the years and no one that I deemed competent charged less than $100/hour.
I did find people charging as low as $50/hour but they didn't seem that great, or that reliable, or that available.
For me as well, you pretty much need to trust your sysadmin more than almost anyone else in your company (except people that can sign checks).
Trust comes at a premium.
I will yield that I don't like working with substandard people. I hate having to manage people and am willing to pay a premium for people I trust to work on the right things with the right skills in a timely manner. Besides, it doesn't scale. One of my business goals is to never have middle management.
You'd be more convincing if you didn't imply that anyone who worked for less than $100/hr was "substandard". For full-time salaried, I assure you, $200k/yr is not the going rate for sysadmin.
You cant directly equate contract hourly rate with yearly salary. If you don't have the budget to hire a 100k a yr salary, you'll have to pay contract rates which could be over 100/hr
I buy that the valley is so hot right now that a sysadmin commands $100k/yr there, but they don't in Chicago, Seattle, or New York.
I'm not looking to argue so much as to inject some more data into the price point that was casually dropped on this thread earlier. I do not think the other guy overpaid for sysadmin; if he's got an amazing admin, great! I can see paying a premium for that.
Another point I'd like to raise is, if you're paying $100k/hr for sysadmin, and using them frequently, contracting instead of fulltiming sysadmin might be penny-wise-pound-foolish. But maybe not, if you're only paying $10,000/yr in sysadmin. We've used over 100 hours of admin in just the last couple weeks. A great hire; one we made "37signals-style", after realizing that doing all the sysadmin chores ourselves was subtly making us all miserable and ineffective.
Believe me, I know the difference between contracting rates and salary. ;)
I don't think I implied that. I said only that I interviewed a handful of people and of that sample set, the people that charged less than $100/hour were not competent. They suffered from things like only knowing one flavor of UNIX, or having never heard of chef/puppet/etc, or not really knowing shell scripting, or not having heard of security stuff that even I know about.
I am not an experienced sysadmin, but when I feel like I could do a better job than the person I'm interviewing, I consider that substandard.
He is absolutely correct. A senior Linux admin is going to charge $75 an hour, and the contracting company that finds them will charge $25 an hour for handling the paperwork.
You could hire someone full time for about $120K a year but then with benefits and employer taxes it will end up being closer to $150K a year anyway.
Are you sure we're talking about "system administrator" and not a more specialized role like "devops"? I know to us nerds those are basically the same thing, but they really aren't. If you're doing sysadmin for the primary purpose of deploying and maintaining boxes designed to run one proprietary application, you might be a devops person, and not a sysadmin.
Again, this might be a valley thing. I've got a bunch of friends who have complained how hard it is to get sysadmin in SFBA. Just know, if that's the case, the dropoff in salary outside the valley is waaaaaaaaaay sharper than it is for dev, which is pretty much just COLA adjusted from place to place.
(We staff offices in Manhattan, Chicago, and Mountain View, for what it's worth).
For my part, I suppose I was talking devops, or at least a sysadmin that thinks devops is a good idea and can help make it happen.
It's probably wrong from a larger perspective to lump the two together, but for me sysadmin as a role has been supplanted with devops. If you are a sysadmin that doesn't want to automate away their work via devops, you are not doing it right.
I agree that as a career choice, "system administrator" seems less lucrative than "devops". Remember, though, that not every sysadmin job is for an agile software shop. There are plenty of places that care a lot more about patch cycles than they do about single-click deployments.
More importantly though, bill rates and capability do not track each other. Bill rates track risk. There is a lot of amazing- but- unproven systems talent out there, and there are some very expensive pikers on the market too. Generally I would agree that the more you pay, the less likely you are to have to fire 3 months later.
It helps a lot to be able to do the job yourself, soup-to-nuts, so that you'll have a better shot at screening candidates.
Not sure where you're getting your numbers at, but $100K is entry level for a sysadmin in NYC, and $120K is normal for a sysadmin with a few years of experience. I'm sure northern California is similar.
$100k is not entry level for a sysadmin in NYC. I get my numbers from the fact that just about half my company, including its headquarters, is in NYC.
I am absolutely confident that there's some role/vertical definition you can come up with where sysadmins are making $100k in the door. But I think you're discounting most of the market for sysadmins, just like the people on the other current HN sysadmin thread who all seem to believe that sysadmins also optimize SQL queries, fix C code, and teach systems programming to stupid developers.
You've just echoed my experience with chef but I'm still on the fence as to whether I think it was a success or not and whether I'd use it again. We were able to achieve everything we wanted but to say it took a metric fuckton of time would be a grave understatement. Even what are ordinarily trivial things took a lot longer than I expected. Some things I just found to be oddly designed and surprisingly inflexible, especially the recipe construct. I still to this day see no point of having a folder with 2 subfolders with 3 files combined for what amounts to "apt get install somepackage". And having to learn a DSL to replace the already very effective bash is in my opinion not a very enticing trade.
Although it's very new, I think I'll try out juju.ubuntu.com for my next large deployment instead of sticking with chef or puppet, if only for the flexibility of the hooks-in-any-language thing and a chance to feed my bash fetish.
We consistently hear 2-6 weeks to get to full production, with 30-120 minutes for some proof of concept work.
And one of the great things about Puppet is you can start very small - automate the parts that are hardest, most critical, etc., and do bits manually as it makes sense for your problem set.
And yeah, managing dev and prod the same way is absolutely critical for clean deployments.
Is there a something like Foreman[1] for Chef ? Something able to create and spin up new VM and execute receipts after install ( by using just a "web browser" ).
Oh nice I hadn't seen Foreman before. Looks like EC2 console but for your own stack. We use XEN, that might be nice.
I don't know what you mean by "execute receipts" however in the video they kick off a puppet run, which makes me think you could similarly kick off a chef run instead.
I agree with the author that Chef is an incredibly powerful tool and that it has numerous benefits over plain old bash. However, I'm reluctant to actually use it because of its dependencies. Chef relies on CouchDB, RabbitMQ, and Solr, and all of those have non-trivial dependencies as well. Then, with a stack like that, I worry about the overheard involved.
FWIW, Puppet's dependencies are much simpler. I don't know much about Chef vs Puppet, but I can say that from an installation and dependency maintenance POV, Puppet wins.
To be clear, those dependencies are required for running your own open source chef server.
You can use chef without the server, as chef-solo, or let Opscode run it for you in the form of Opscode Hosted Chef. We do make it easy to install Chef Server through Chef Solo itself, or with our Ubuntu/Debian apt repository.
I never understood what kind of person/company would trust a hosted chef.
The chef databags/cookbooks tend to contain rather sensitive information (ssh-keys, passwords). Handing all that stuff over to a third-party borders on criminal negligence to me.
If you've already decided on Chef then the Opscode platform provides great value. You get the best and most knowledgable Chef admins handling your Chef server for a few bucks per hour.
Just to be clear, this has always been a core goal with Puppet - very low dependencies to make it easy to adopt.
There are some real downsides - we have to do a lot more coding, and it can be tough to get all of the hot newness - but we think the users benefit from a much simpler solution that's much easier to support.
You really should check out CFEngine 3. Very few dependencies (pcre,berkeleydb,openssl), and they also provide free packages with all the dependencies included: http://cfengine.com/download
The memory footprint is about 10 MB, install size maybe 30 MB.
Speaking of dependencies, CFEngine 3 is written in C and has 3 dependencies:
berkeley db,
libcrypo,
and PCRE.
It compiles into small binaries and is usable anywhere - in the cloud, on supercompute clusters, on the desktop or laptop, on a smartphone, in embedded devices.
1. The node object ends up being two large, which leads to memory issues when a search returns more that 200 nodes (800MB of memory).
2. Chef discourages declarative configuration.
3. Chef lacks a remote trigger mechanism.
Issue one starts to kill you once you have a large number of nodes. Dedicating a quater of the available memory to configuration management seem like a poor financial choice. We've come up with work arounds at my company(generate files centrally, and distribute with chef remote_file syntax), but I still feel they are hacky.
Issue two is more serious. Reindexing on chef only occurs after a node has submitted its node object back to the chef server. This results in incomplete searches until a node successfully completes a run. If you wish to remove a node from a particular role or attribute from a host, you may have a hard time doing so until the next chef run completes.
Problem three is really a result of the expense of running chef. If the memory and CPU costs were lower, there wouldn't be any real issues running Chef more frequently. Some changes I need to go out immediately, some don't matter. I end up back in the world of the SSH loop too often with Chef.
1. We are working on making the search more performance and use less memory.
2. Chef definitely does not discourage declarative configuration. Chef recipes include declarative resource for configuring your infrastructure. Since recipes are an internal ruby dsl, there may be nondeclarative code in them.
3. Chef itself doesn't have a remote trigger mechanism because the. Her run is all about configuring the local node. Nothing prevents you from using the ruby language in a recipe to hook up some kind of remote trigger though. People in the chef community are doing this with projects like Noah and Pylon.
1. I understand that, and have heard that from you guys several times. The issue is the deserialization from JSON. The monkey patch solution I've see so far stores less data, essentially white listing the attributes for search. It sort of destroys the value of search.
2. I should be more specific. Generally chef relies on the information that ohai provides, not with information enumerated by the administrator. There is a general assumption that the systems are properly configured, and chef is only furthering that, since the hosts provide most of the configuration details. (Yes, you could do stuff with data bags to address this issue.)
3. Thanks for the links. I've not seen those projects previously. I've solved the issue for myself using a much lighter weight solution.
1. Yeah, the stop-gap "solution" is to white list a number of attributes in order to reduce the data set, because in the most common use case we see, there's only a few attributes that people actually care about in the chef-client context. The node's run list, its IP address or FQDN. Really. That is the most common. For like, 90%+ of the use cases out there. Everyone has a unique snowflake and thats cool, but really, not that much.
2. There's no assumption that systems are correctly configured other than they start from a baseline configuration in the most common use case. We have worked with several customers managing existing infrastructures of running systems that had an unknown baseline and Chef was able to automated the pieces they cared about.
* Why not use transactions? (freebsd jails, linux namespaces) and allow integration testing for the applications in a given environment? Think rspec for integration testing applications and rolling back on failure.
* Why not modularize chef, right now it's a nightmare to get working, emerge makes a graph about 200 nodes long...
I plan on sometime making a puppet / chef alternative because they both seem lacking
I've looked at cfengine, bfg and others and they all miss the point of what a configuration management system should do, 1. It should be simple, 2. It should have integration testing., 3. It should be simple. None do this or even try to
I think this post should be titled Why Configuration Management?, with a subtext of using Chef as an example. The main points the author is making are true of CFEngine, Puppet, Chef, and other config management software. The question of Why Chef? can be answered very simply: because the author likes it. Which is a perfectly valid way to choose your tool chain, assuming the technical requirements are met.
As a Ruby developer, I like Chef's Ruby DSL; but I somehow feel that its imperative DSL will lead to something similar to Bash Hell. I'd like to read more about the declarative properties of Puppet vs the imperative way of Chef, and why one should prefer one over the other.
Secondly, like some commenters before mention, Chef's dependencies seem quite excessive.
The main difference with Chef is that it has imperative structure (i.e. ruby-based scripting) to fall back on, whereas Puppet forces you to go out to a script if you want to be imperative.
In the long run, it's arguable that a strict declarative structure has more interesting properties when dealing with query of status & handling changes or drifts rather than "stamping out a server".
Puppet has been great for me to detect and correct drift, for example, all the while ensuring pre-requisites are executed in the proper order. The tradeoff is that you have to think through the various dependencies at a detailed level, which can be difficult. An extreme analogy would be programming with a logic language vs. an imperative language.
My experience has been Puppet is growing in popularity with enterprises, not just web companies, and it seems Puppet Labs is targeting this audience more than Opscode is, comparing customer lists. I'm not quite sure why this is - could be attitude - Chef is more about "get it done now", Puppet more about "get it right for the long haul", but even that is a caricature.
We at Puppet Labs have always focused on building tools that anyone can use, not just the best hackers in the world. We've got some of the best sysadmins working with us, but we punish them by making them write software that even people who aren't the best sysadmins can use.
So yeah, we get a ton of enterprise adoption as a result, but we've also got a ton of web companies using Puppet. It's true that in the Rails web startup, we're not always the number 1 choice, though. :)
The number of different configurations is far more important than the number of boxes, no? If you have 1000 completely identical boxes (e.g. big compute cluster), a shell script that sets up a freshly installed box to your requirements is quite possibly sufficient.
I've been working with juju https://juju.ubuntu.com/ (a new Ubuntu server tool) and been very happy with it. It does not directly compete with puppet/chef for enforcing a server configuration however. Juju operates at the higher level of "services" that can be deployed to a cloud, to hardware, or to local LXC containers! It's basically like "apt-get" except for servers. The dev team hangs out at #juju on freenode irc
I managed the process, and consider it a success. However, here are some points I would make:
1. It took a long time. Let's be generous and say it took 2-3 man-months of time to set up 4-5 different projects and roles. This was probably 10-20x what it would've taken to set up the servers directly. Why? Learning curve with chef for both our programmer and sysadmin. Figuring out how to make config changes automatable and idempotent.
2. The scale you get from chef is bigger than managing production infrastructure. We now use chef for not only production deployment, but also dev. Once paired with Vagrant, we are able to get new devs up with a complete stack in about 10m of keyboard time. If we need to upgrade to some new version of something, only one person has to deal with the sysadmin; everyone else can just update their box.
3. I think it will save money in the long-term. A good sysadmin is $100/hr+. Unfortunately you have to pay that rate whether they're doing architecture, security review, or just editing text files. With chef, a non-sysadmin resource can generate recipes with just architectural advice and review from a sysadmin. This is much more efficient, especially for small shops where a sysadmin is an expensive and not immediately available resource.