Hacker News new | past | comments | ask | show | jobs | submit login

I think there's an alternate universe out there where:

- we collectively realized that logs, events, traces, metrics, and errors are actually all just logs

- we agreed on a single format that encapsulated all that information in a structured manner

- we built firehose/stream processing tooling to provide modern o11y creature comforts

I can't tell if that universe is better than this one, or worse.




Traces are just distributed "logs" (in the data structure sense; data ordered only by its appearance in something) where you also pass around the tiniest bit of correlation context between apps. Traces are structured, timestamped, and can be indexed into much more debug-friendly structures like a call tree. But you could just as easily ignore all the data and print them out in streaming sorted order without any correlation.

Honestly it sounds like you're pitching opentelemetry/otlp but where you only trace and leave all the other bits for later inside your opentelemetry collector, which can turn traces into metrics or traces into logs.


So this is kind of what I was talking about but it's more than that -- if your default is structured logs (simplest example is JSON) then all you have to do is put the data you care about into the log.

So I'm imagining something more like:

   {"level":"info", "otlp": { "trace": { ... }}}

   {"level":"info", "otlp": { "error": { ... }}}

   {"level":"info", "otlp": { "log": { ... }}}

   {"level":"info", "otlp": { "metric": { ... }}}
(standardizing this format would be non-trivial of course, but I could imagine a really minimal standard)

Your downstream collector only needs one API endpoint/ingestion mechanism -- unpacking the actual type of telemetry that came in (and persisting where necessary) can be left to other systems.

Basically I think the systems could have been massively simpler in most UNIX-y environments -- just hook up STDOUT (or scrape it, or syslog or whatever), and you're done -- no allowing ports out for jaeger, dealing with complicated buffering, etc -- just log and forget.


That's more or less the model Honeycomb uses. Every signal type is just a structured event. Reality is a bit messier, though. In particular, metrics are the oddball in this world and required a lot of work to make economical.


Ah thanks for noting this, I that's exactly the insight I mean here.

Yeah I think the worst case you basically just exfiltrate metrics out to other subsystems (honestly, you could kind of exfiltrate all of this), but the default is pipe heavily compressed stuff to short and long term storage, and some processors for real time... blah blah blah.

Obviously Honeycomb is actually doing the thing and it's not as easy as it sounds, but it feels like if we had all thought like this earlier we might have skipped making a few protocols (zipkin, jaeger, etc), and focused on just data layout (JSON vs protobuf vs GELF, etc) and figuring out what shapes to expect across tools.


Is that really an alternate universe? That’s the universe that splunk and friends are selling, everything’s a log. It’s really expensive.


Splunk does have margins and I think they're quite high. Same with Datadog (see: all the HN startups that are trying to grab some of that space).

There's a big gap between what it takes for the engineering to work and what all these companies charge.

My point is really more about the engineering time wasted on different protocols and stuff when we could have stuffed everything into minimally structured log lines (and figured out the rest of the insight machinery later). Concretely, that zipkin/jaeger/prometheus protocols and stuff may not have needed to exist, etc.


Once you have logs, you can index them in a variety of ways to turn them into metrics, traces, etc., but having logs as the fundamental primitive is powerful.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: