1. Juice up your Traces with every attribute possible
2. Use a telemetry backend that relies on cheap object storage so that your costs don't explode.
3. ...profit?
Ok, but now we are exporting and storing everything about every request just so we can derive some previously cheap metrics like server CPU consumption? I guess for most applications the overhead of buffering, formatting and sending all of this telemetry data doesn't matter for folks?
Yes, it is absurdly expensive no matter what the marketing says. It’s only “cheap” if you’re setting VC cash on fire.
The benefit is that you can retroactively extract reports filtered with very complex predicates.
Sure, aggregated metrics are cheap and efficient, but trivial metrics like CPU usage just tell you that there is a problem, not what the problem is. If you need to “deep dive”, you can’t, not without a Time Machine to go back and configure a filtered metric looking for the specific info you need.
Most sysadmins at this point would just configure a new filtered metric and start collecting data… for a month. While the system is broken. Wrong needle? Start looking through the haystack again with another new custom metric for another month.
As a random example, many systems will track 5xx errors per minute. Great, but are those timeouts or instant failures? I want to group 5xx errors per time bucket! Are those correlated by app release version? By server memory free bytes? By instance? Kernel version? Etc…
Wide events let you do all those and more, trivially and quickly: seconds instead of months.
> Most sysadmins at this point would just configure a new filtered metric and start collecting data… for a month. While the system is broken. Wrong needle? Start looking through the haystack again with another new custom metric for another month.
In this example i feel like it is treating metrics as the only telemetry signal that operators have access to. Once the metrics indicate an issue, we can pull existing logs, traces and profiles to dig into it and eventually capture dumps.
I'm totally onboard with the idea of rich trace metadata, but it seems more evolutionary than revolutionary
1. Juice up your Traces with every attribute possible 2. Use a telemetry backend that relies on cheap object storage so that your costs don't explode. 3. ...profit?
Ok, but now we are exporting and storing everything about every request just so we can derive some previously cheap metrics like server CPU consumption? I guess for most applications the overhead of buffering, formatting and sending all of this telemetry data doesn't matter for folks?