Why Chef?

apinstein · on Oct 16, 2011

I am a co-founder of a small SaaS company. We recently decided to make the investment of upgrading our infrastructure setup process from "Hey David go do that" to being 100% chef-based.

I managed the process, and consider it a success. However, here are some points I would make:

1. It took a long time. Let's be generous and say it took 2-3 man-months of time to set up 4-5 different projects and roles. This was probably 10-20x what it would've taken to set up the servers directly. Why? Learning curve with chef for both our programmer and sysadmin. Figuring out how to make config changes automatable and idempotent.

2. The scale you get from chef is bigger than managing production infrastructure. We now use chef for not only production deployment, but also dev. Once paired with Vagrant, we are able to get new devs up with a complete stack in about 10m of keyboard time. If we need to upgrade to some new version of something, only one person has to deal with the sysadmin; everyone else can just update their box.

3. I think it will save money in the long-term. A good sysadmin is $100/hr+. Unfortunately you have to pay that rate whether they're doing architecture, security review, or just editing text files. With chef, a non-sysadmin resource can generate recipes with just architectural advice and review from a sysadmin. This is much more efficient, especially for small shops where a sysadmin is an expensive and not immediately available resource.

tptacek · on Oct 16, 2011

A good sysadmin costs $200k/year? Or do you have a contract sysadmin? Because that number sounds way, way, way high.

(We recently hired a sysadmin "+").

apinstein · on Oct 16, 2011

We've been in business for about 8 years and use about ~100 sysadmin hours a year.

I interviewed several sysadmins over the years and no one that I deemed competent charged less than $100/hour.

I did find people charging as low as $50/hour but they didn't seem that great, or that reliable, or that available.

For me as well, you pretty much need to trust your sysadmin more than almost anyone else in your company (except people that can sign checks).

Trust comes at a premium.

I will yield that I don't like working with substandard people. I hate having to manage people and am willing to pay a premium for people I trust to work on the right things with the right skills in a timely manner. Besides, it doesn't scale. One of my business goals is to never have middle management.

tptacek · on Oct 16, 2011

You'd be more convincing if you didn't imply that anyone who worked for less than $100/hr was "substandard". For full-time salaried, I assure you, $200k/yr is not the going rate for sysadmin.

nestlequ1k · on Oct 16, 2011

You cant directly equate contract hourly rate with yearly salary. If you don't have the budget to hire a 100k a yr salary, you'll have to pay contract rates which could be over 100/hr

tptacek · on Oct 16, 2011

I buy that the valley is so hot right now that a sysadmin commands $100k/yr there, but they don't in Chicago, Seattle, or New York.

I'm not looking to argue so much as to inject some more data into the price point that was casually dropped on this thread earlier. I do not think the other guy overpaid for sysadmin; if he's got an amazing admin, great! I can see paying a premium for that.

Another point I'd like to raise is, if you're paying $100k/hr for sysadmin, and using them frequently, contracting instead of fulltiming sysadmin might be penny-wise-pound-foolish. But maybe not, if you're only paying $10,000/yr in sysadmin. We've used over 100 hours of admin in just the last couple weeks. A great hire; one we made "37signals-style", after realizing that doing all the sysadmin chores ourselves was subtly making us all miserable and ineffective.

Believe me, I know the difference between contracting rates and salary. ;)

apinstein · on Oct 16, 2011

I don't think I implied that. I said only that I interviewed a handful of people and of that sample set, the people that charged less than $100/hour were not competent. They suffered from things like only knowing one flavor of UNIX, or having never heard of chef/puppet/etc, or not really knowing shell scripting, or not having heard of security stuff that even I know about.

I am not an experienced sysadmin, but when I feel like I could do a better job than the person I'm interviewing, I consider that substandard.

illumin8 · on Oct 16, 2011

He is absolutely correct. A senior Linux admin is going to charge $75 an hour, and the contracting company that finds them will charge $25 an hour for handling the paperwork.

You could hire someone full time for about $120K a year but then with benefits and employer taxes it will end up being closer to $150K a year anyway.

tptacek · on Oct 16, 2011

$120k is nosebleed high for full-time sysadmin.

Are you sure we're talking about "system administrator" and not a more specialized role like "devops"? I know to us nerds those are basically the same thing, but they really aren't. If you're doing sysadmin for the primary purpose of deploying and maintaining boxes designed to run one proprietary application, you might be a devops person, and not a sysadmin.

Again, this might be a valley thing. I've got a bunch of friends who have complained how hard it is to get sysadmin in SFBA. Just know, if that's the case, the dropoff in salary outside the valley is waaaaaaaaaay sharper than it is for dev, which is pretty much just COLA adjusted from place to place.

(We staff offices in Manhattan, Chicago, and Mountain View, for what it's worth).

apinstein · on Oct 16, 2011

For my part, I suppose I was talking devops, or at least a sysadmin that thinks devops is a good idea and can help make it happen.

It's probably wrong from a larger perspective to lump the two together, but for me sysadmin as a role has been supplanted with devops. If you are a sysadmin that doesn't want to automate away their work via devops, you are not doing it right.

tptacek · on Oct 16, 2011

I agree that as a career choice, "system administrator" seems less lucrative than "devops". Remember, though, that not every sysadmin job is for an agile software shop. There are plenty of places that care a lot more about patch cycles than they do about single-click deployments.

More importantly though, bill rates and capability do not track each other. Bill rates track risk. There is a lot of amazing- but- unproven systems talent out there, and there are some very expensive pikers on the market too. Generally I would agree that the more you pay, the less likely you are to have to fire 3 months later.

It helps a lot to be able to do the job yourself, soup-to-nuts, so that you'll have a better shot at screening candidates.

illumin8 · on Oct 17, 2011

Not sure where you're getting your numbers at, but $100K is entry level for a sysadmin in NYC, and $120K is normal for a sysadmin with a few years of experience. I'm sure northern California is similar.

tptacek · on Oct 17, 2011

$100k is not entry level for a sysadmin in NYC. I get my numbers from the fact that just about half my company, including its headquarters, is in NYC.

I am absolutely confident that there's some role/vertical definition you can come up with where sysadmins are making $100k in the door. But I think you're discounting most of the market for sysadmins, just like the people on the other current HN sysadmin thread who all seem to believe that sysadmins also optimize SQL queries, fix C code, and teach systems programming to stupid developers.

Dalves · on Oct 16, 2011

You've just echoed my experience with chef but I'm still on the fence as to whether I think it was a success or not and whether I'd use it again. We were able to achieve everything we wanted but to say it took a metric fuckton of time would be a grave understatement. Even what are ordinarily trivial things took a lot longer than I expected. Some things I just found to be oddly designed and surprisingly inflexible, especially the recipe construct. I still to this day see no point of having a folder with 2 subfolders with 3 files combined for what amounts to "apt get install somepackage". And having to learn a DSL to replace the already very effective bash is in my opinion not a very enticing trade.

Although it's very new, I think I'll try out juju.ubuntu.com for my next large deployment instead of sticking with chef or puppet, if only for the flexibility of the hooks-in-any-language thing and a chance to feed my bash fetish.

lkanies · on Oct 17, 2011

(disclaimer - Puppet Labs employee)

2-3 months is a long time.

We consistently hear 2-6 weeks to get to full production, with 30-120 minutes for some proof of concept work.

And one of the great things about Puppet is you can start very small - automate the parts that are hardest, most critical, etc., and do bits manually as it makes sense for your problem set.

And yeah, managing dev and prod the same way is absolutely critical for clean deployments.

devmach · on Oct 16, 2011

Is there a something like Foreman[1] for Chef ? Something able to create and spin up new VM and execute receipts after install ( by using just a "web browser" ).

[1]: http://theforeman.org/projects/foreman/wiki/Screencasts

apinstein · on Oct 16, 2011

Oh nice I hadn't seen Foreman before. Looks like EC2 console but for your own stack. We use XEN, that might be nice.

I don't know what you mean by "execute receipts" however in the video they kick off a puppet run, which makes me think you could similarly kick off a chef run instead.

manvsmachine · on Oct 16, 2011

I'm guessing that 'receipts' was meant to be 'recipes'.

devmach · on Oct 16, 2011

Sorry, my bad. I meant to say "recipes".

bryanwb · on Oct 16, 2011

awesome comments, apinstein

btw, I am copying your method of managing dotfiles with a Rakefile as we speak.

I counter that while it takes a while to learn chef, it also takes many times more effort to maintain a custom set of shell scripts over the long term

sjwright · on Oct 16, 2011

I learned a new word today: Idempotence. Thank you.

codypo · on Oct 16, 2011

I agree with the author that Chef is an incredibly powerful tool and that it has numerous benefits over plain old bash. However, I'm reluctant to actually use it because of its dependencies. Chef relies on CouchDB, RabbitMQ, and Solr, and all of those have non-trivial dependencies as well. Then, with a stack like that, I worry about the overheard involved.

FWIW, Puppet's dependencies are much simpler. I don't know much about Chef vs Puppet, but I can say that from an installation and dependency maintenance POV, Puppet wins.

jtimberman · on Oct 16, 2011

(disclosure, I work for Opscode.)

To be clear, those dependencies are required for running your own open source chef server.

You can use chef without the server, as chef-solo, or let Opscode run it for you in the form of Opscode Hosted Chef. We do make it easy to install Chef Server through Chef Solo itself, or with our Ubuntu/Debian apt repository.

moe · on Oct 17, 2011

I never understood what kind of person/company would trust a hosted chef.

The chef databags/cookbooks tend to contain rather sensitive information (ssh-keys, passwords). Handing all that stuff over to a third-party borders on criminal negligence to me.

jtimberman · on Oct 20, 2011

Cookbooks are accessible via your private key, which Opscode Hosted Chef does not have a copy.

You can choose to encrypt the contents of a data bag using a locally generated (on your hardware, nothing we control) key.

jethroalias97 · on Oct 17, 2011

So... should nobody use ec2? Or any host for that matter? Sooner or later you're going to have to trust some third-party.

moe · on Oct 17, 2011

There's a difference between trusting someone with your physical hardware and handing them your credentials on a silver plate.

There's also a pretty harsh difference between the security practices at Amazon and the practices that Opscode displays in their OSS-code.

Fluxx · on Oct 16, 2011

You can use "Hosted Chef," where Opscode hosts all the dependencies for you:

http://www.opscode.com/hosted-chef/

sjs · on Oct 16, 2011

If you've already decided on Chef then the Opscode platform provides great value. You get the best and most knowledgable Chef admins handling your Chef server for a few bucks per hour.

dlsspy · on Oct 16, 2011

If you haven't decided, it's a great way to try it out.

I'm an opscode customer, but before that, I was playing around with their free plan.

lkanies · on Oct 17, 2011

(disclaimer - founder of Puppet)

Just to be clear, this has always been a core goal with Puppet - very low dependencies to make it easy to adopt.

There are some real downsides - we have to do a lot more coding, and it can be tough to get all of the hot newness - but we think the users benefit from a much simpler solution that's much easier to support.

dholmen · on Oct 17, 2011

You really should check out CFEngine 3. Very few dependencies (pcre,berkeleydb,openssl), and they also provide free packages with all the dependencies included: http://cfengine.com/download

The memory footprint is about 10 MB, install size maybe 30 MB.

atsaloli · on Oct 17, 2011

Speaking of dependencies, CFEngine 3 is written in C and has 3 dependencies:

berkeley db,

libcrypo,

and PCRE.

It compiles into small binaries and is usable anywhere - in the cloud, on supercompute clusters, on the desktop or laptop, on a smartphone, in embedded devices.

gpapilion · on Oct 16, 2011

I have three fundamental issues with Chef;

1. The node object ends up being two large, which leads to memory issues when a search returns more that 200 nodes (800MB of memory).

2. Chef discourages declarative configuration.

3. Chef lacks a remote trigger mechanism.

Issue one starts to kill you once you have a large number of nodes. Dedicating a quater of the available memory to configuration management seem like a poor financial choice. We've come up with work arounds at my company(generate files centrally, and distribute with chef remote_file syntax), but I still feel they are hacky.

Issue two is more serious. Reindexing on chef only occurs after a node has submitted its node object back to the chef server. This results in incomplete searches until a node successfully completes a run. If you wish to remove a node from a particular role or attribute from a host, you may have a hard time doing so until the next chef run completes.

Problem three is really a result of the expense of running chef. If the memory and CPU costs were lower, there wouldn't be any real issues running Chef more frequently. Some changes I need to go out immediately, some don't matter. I end up back in the world of the SSH loop too often with Chef.

jtimberman · on Oct 16, 2011

1. We are working on making the search more performance and use less memory.

2. Chef definitely does not discourage declarative configuration. Chef recipes include declarative resource for configuring your infrastructure. Since recipes are an internal ruby dsl, there may be nondeclarative code in them.

3. Chef itself doesn't have a remote trigger mechanism because the. Her run is all about configuring the local node. Nothing prevents you from using the ruby language in a recipe to hook up some kind of remote trigger though. People in the chef community are doing this with projects like Noah and Pylon.

Github.com/lusis/Noah Github.com/fujin/pylon

gpapilion · on Oct 16, 2011

1. I understand that, and have heard that from you guys several times. The issue is the deserialization from JSON. The monkey patch solution I've see so far stores less data, essentially white listing the attributes for search. It sort of destroys the value of search.

2. I should be more specific. Generally chef relies on the information that ohai provides, not with information enumerated by the administrator. There is a general assumption that the systems are properly configured, and chef is only furthering that, since the hosts provide most of the configuration details. (Yes, you could do stuff with data bags to address this issue.)

3. Thanks for the links. I've not seen those projects previously. I've solved the issue for myself using a much lighter weight solution.

jtimberman · on Oct 20, 2011

1. Yeah, the stop-gap "solution" is to white list a number of attributes in order to reduce the data set, because in the most common use case we see, there's only a few attributes that people actually care about in the chef-client context. The node's run list, its IP address or FQDN. Really. That is the most common. For like, 90%+ of the use cases out there. Everyone has a unique snowflake and thats cool, but really, not that much.

2. There's no assumption that systems are correctly configured other than they start from a baseline configuration in the most common use case. We have worked with several customers managing existing infrastructures of running systems that had an unknown baseline and Chef was able to automated the pieces they cared about.

Again with "most common use case."

nwmcsween · on Oct 16, 2011

* Why not use transactions? (freebsd jails, linux namespaces) and allow integration testing for the applications in a given environment? Think rspec for integration testing applications and rolling back on failure.

* Why not modularize chef, right now it's a nightmare to get working, emerge makes a graph about 200 nodes long...

I plan on sometime making a puppet / chef alternative because they both seem lacking

atsaloli · on Oct 17, 2011

You might want to take a look at CFEngine too. It's in that puppet/chef family:

http://verticalsysadmin.com/blog/uncategorized/relative-orig...

And Bcfg2.

Not to discourage you from innovating; just so you know what's out there today.

nwmcsween · on Oct 17, 2011

I've looked at cfengine, bfg and others and they all miss the point of what a configuration management system should do, 1. It should be simple, 2. It should have integration testing., 3. It should be simple. None do this or even try to

adient · on Oct 16, 2011

I think this post should be titled Why Configuration Management?, with a subtext of using Chef as an example. The main points the author is making are true of CFEngine, Puppet, Chef, and other config management software. The question of Why Chef? can be answered very simply: because the author likes it. Which is a perfectly valid way to choose your tool chain, assuming the technical requirements are met.

bryanwb · on Oct 16, 2011

I wrote about that it the previous post, see her http://devopsanywhere.blogspot.com/2011/10/puppet-vs-chef-fi...

nona · on Oct 16, 2011

Two reasons I'm wary of Chef wrt Puppet

1) the declarative vs imperative aspect

2) Chef's heavy dependencies

As a Ruby developer, I like Chef's Ruby DSL; but I somehow feel that its imperative DSL will lead to something similar to Bash Hell. I'd like to read more about the declarative properties of Puppet vs the imperative way of Chef, and why one should prefer one over the other.

Secondly, like some commenters before mention, Chef's dependencies seem quite excessive.

parasubvert · on Oct 16, 2011

Chef is becoming more declarative as it grows.

The main difference with Chef is that it has imperative structure (i.e. ruby-based scripting) to fall back on, whereas Puppet forces you to go out to a script if you want to be imperative.

In the long run, it's arguable that a strict declarative structure has more interesting properties when dealing with query of status & handling changes or drifts rather than "stamping out a server".

Puppet has been great for me to detect and correct drift, for example, all the while ensuring pre-requisites are executed in the proper order. The tradeoff is that you have to think through the various dependencies at a detailed level, which can be difficult. An extreme analogy would be programming with a logic language vs. an imperative language.

My experience has been Puppet is growing in popularity with enterprises, not just web companies, and it seems Puppet Labs is targeting this audience more than Opscode is, comparing customer lists. I'm not quite sure why this is - could be attitude - Chef is more about "get it done now", Puppet more about "get it right for the long haul", but even that is a caricature.

lkanies · on Oct 17, 2011

(disclaimer - founder of Puppet)

We at Puppet Labs have always focused on building tools that anyone can use, not just the best hackers in the world. We've got some of the best sysadmins working with us, but we punish them by making them write software that even people who aren't the best sysadmins can use.

So yeah, we get a ton of enterprise adoption as a result, but we've also got a ton of web companies using Puppet. It's true that in the Rails web startup, we're not always the number 1 choice, though. :)

grandalf · on Oct 16, 2011

For < 50 servers, can anyone comment on my choice of fabric and cuisine as an alternative to chef?

JoachimSchipper · on Oct 16, 2011

The number of different configurations is far more important than the number of boxes, no? If you have 1000 completely identical boxes (e.g. big compute cluster), a shell script that sets up a freshly installed box to your requirements is quite possibly sufficient.

thibaut_barrere · on Oct 16, 2011

I decided to use chef-solo instead, mostly as a way to start learning chef but it ended be good enough for me.

This way I'll be able to invest in the server part of chef later on, when I need it.

kim0 · on Oct 17, 2011

I've been working with juju https://juju.ubuntu.com/ (a new Ubuntu server tool) and been very happy with it. It does not directly compete with puppet/chef for enforcing a server configuration however. Juju operates at the higher level of "services" that can be deployed to a cloud, to hardware, or to local LXC containers! It's basically like "apt-get" except for servers. The dev team hangs out at #juju on freenode irc

leandrod · on Oct 16, 2011

I cringe at CouchDB. Gimme good, old PostgreSQL.