Suppose you have a long data pipeline that you want to trace jobs across. There are not an enormous number of jobs but each one takes 12 hours across many phases. In theory tracing works great here, but in practice most tracing platforms can’t handle this. This is especially true with tailed based tracing as traces can be unbounded and it has to assume at some point their time out. You can certainly build your own, but most of the value of tracing solutions is the user experience; which is also the hardest part.
On stream processing I’ve generally found it too expensive to instrument stream processors with tracing. Also there’s generally not enough variability to make it interesting. Context stitching and span management as well as sweeping and shipping of traces can be expensive in a lot of implementations and stream processing is often cpu bound.
A simple transaction id annotated log makes a lot more sense in both, queried in a log analytic platform.
Suppose you have a long data pipeline that you want to trace jobs across. There are not an enormous number of jobs but each one takes 12 hours across many phases. In theory tracing works great here, but in practice most tracing platforms can’t handle this. This is especially true with tailed based tracing as traces can be unbounded and it has to assume at some point their time out. You can certainly build your own, but most of the value of tracing solutions is the user experience; which is also the hardest part.
On stream processing I’ve generally found it too expensive to instrument stream processors with tracing. Also there’s generally not enough variability to make it interesting. Context stitching and span management as well as sweeping and shipping of traces can be expensive in a lot of implementations and stream processing is often cpu bound.
A simple transaction id annotated log makes a lot more sense in both, queried in a log analytic platform.