Grafana offers one of the best solutions for storing and navigating through telemetry data, but there remains a challenge in using this data to generate insights. Our goal is to address this issue, even if it means occasionally reinventing the wheel. This is a necessary step at this stage.
With ebpf-exporter it is not possible to implement complex logic, such as converting the PID of each TCP connection into a container name and the destination IP into a real IP according to the conntrack table.
This seems like a limitation that could be lifted instead of introducing a separate product (disclaimer: familiar with ebpf exporter but haven’t dug into OP).
Iirc ebpf exporter had some limitations, but they weren’t fundamental. However it was also fairly light, so maybe another tool is just the right solve.
Coroot's agent collects data from various sources to cover all aspects of container behavior. Ebpf-exporter perfectly solves the problem of running custom ebpf programs and turning their output into metrics, but using it as a foundation for more specific solutions doesn't seem reasonable
Take a look at https://github.com/coroot/coroot (Apache 2.0), a zero-instrumentation observability tool for microservice architectures.
Thanks to eBPF, it can be integrated in minutes.
When we started working on Coroot, we weren't sure if it would be possible to create such a product or not. In order to verify the initial hypotheses, we reproduced real-world failure scenarios in our staging environment and recorded their telemetry. Now we are able to replay this telemetry data, applying Coroot's inspections, to detect particular failures.
Along the way we discovered two things:
* Real-life telemetry data is noisy and inaccurate, so using this allows us to develop more accurate inspections.
* Replaying these scenarios is actually very entertaining!
At Coroot we call our library of recorded failure scenarios Failurepedia, and we want to share it with you.