I know what you're saying but even if you're off by an order of a magnitude, at ...

jholman · on March 21, 2013

When you forget even one factor, you can be off by many orders of magnitude.

And you're measuring the cost of the real data in totally useless terms. Who gives a shit about linear scans and "data destruction", lolz. What matters is how long it took the OP to figure out what command to write, and write it, and write the cleanup commands. If he already knew how to do it, then it was basically free. If he didn't already know how to do it, then he learned something, which pays for most of the cost of doing it.

And the napkin analysis is so trivial, but you don't EVER know that it's accurate, because there's no error bound on "oh I forgot the important part".

macspoofing · on March 22, 2013

I don't think you know what you are actually arguing. I'm pretty sure I'm not arguing against performing real world tests to diagnose a problem. I'm pretty sure I'm not doing that because I'm not insane. I'm also fairly sure I didn't argue that you have to do one or other but not both. Though I didn't say it outright, I'm pretty sure I'm implied, and I stand by it, that it is good practice to at least do a fast mental (or napkin) estimate when it is warranted. Debugging is a hard process, but it shouldn't be a random one. You should have a mental model of your system, and it is good practice to go through the exercise of defining said mental model and yes, checking your assumptions, when it is warranted.

I'm clearly not a fan of the direction the OP took to verify his assumption that his key sizes have an impact on his Redis DB size. I think there's a better, more accurate way of checking his hypothesis (napkin math). His approach wasn't great and doesn't work for anything other than toy or test deployments, and the results aren't as clear as you may think (see below). I stand by that.

>And the napkin analysis is so trivial, but you don't EVER know that it's accurate, because there's no error bound on "oh I forgot the important part".

We're still talking about the same situation, correct? I suppose if it's pedantry you want, pedantry I can give. Tell me, how does the OP know that his "real world analysis" is correct? Because at some point in time, for some sort of input the system gave him one kind of result? Apparently a simple multiplication ( #of keys x avg. key size) is fraught with errors, but issuing a command (RENAME) against a black-box datastore the OP probably doesn't fully understand provides a clear, unambiguous result? What if Redis caches (in memory or to disk) all original values before issuing a key rename and then clears them out over a period of time, or better yet, doesn't clear them out until it needs the space? So you run your script, check memory usage, and see no difference.... so of course, because we're you, we trust "real-world result" and we live happily ever after ... yes?

Obviously this is a contrived example and OP most likely got the right result but I think I made my point. "Real world results" are full of gotchas and ambiguities (and sometimes require great deal of background knowledge to properly interpret), and in such a case it would be nice to have a mental model of what the expected result is so that it can be either verified or proven wrong (and thereby provide a direction for further investigation).