More

nikolay_sivko · 2025-04-09T14:11:20 1744207880

We could totally add that, but no one's asked for it so far

nikolay_sivko · 2025-04-09T08:24:06 1744187046

1. Regarding overhead — we ran a benchmark focused on performance impact rather than raw overhead [1]. TL;DR: we didn’t observe any noticeable impact at 10K RPS. CPU usage stayed around 200 millicores (about 20% of a single core).

2. Coroot’s agent captures pseudo-traces (individual spans) and sends them to a collector via OTLP. This stream can be sampled at the collector level. In high-load environments, you can disable span capturing entirely and rely solely on eBPF-based metrics for analysis.

3. We’ve built automated root cause analysis to help users explain even the slightest anomalies, whether or not SLOs are violated. Under the hood, it traverses the service dependency graph and correlates metrics — for example, linking increased service latency to CPU delay or network latency to a database. [2]

4. Currently, Coroot doesn’t support off-CPU profiling. The profiler we use under the hood is based on Grafana Pyroscope’s eBPF implementation, which focuses on CPU time.

[1]: https://docs.coroot.com/installation/performance-impact [2]: https://demo.coroot.com/p/tbuzvelk/anomalies/default:Deploym...

nikolay_sivko · 2025-04-09T06:18:54 1744179534

From a user’s perspective, it doesn’t really matter how the data is collected. What actually matters is whether the tool helps you answer questions about your system and figure out what’s going wrong.

At Coroot, we use eBPF for a couple of reasons:

1. To get the data we actually need, not just whatever happens to be exposed by the app or OS.

2. To make integration fast and automatic for users.

And let’s be real, if all the right data were already available, we wouldn’t be writing all this complicated eBPF code in the first place:)

nikolay_sivko · 2025-04-09T05:16:57 1744175817

Enterprise Edition = Community Edition + Support + AI-based Root Cause Analysis + SSO + RBAC

fjwuafasd · 2025-04-09T14:05:42 1744207542

Thank you!

nikolay_sivko · 2025-04-09T04:31:40 1744173100

Yes, it captures traffic before encryption and after decryption using eBPF uprobes on OpenSSL and Go’s TLS library calls.

nikolay_sivko · 2025-04-08T20:53:07 1744145587

Initially, we relied on the ClickHouse OTEL exporter and its schema, but for performance optimization, we decided to modify our ClickHouse schema, and they are no longer compatible :(

akdor1154 · 2025-04-08T21:04:43 1744146283

Bummer, it'd be awesome if i could point it at data i already have, even if that meant a reduced feature set.

PeterZaitsev · 2025-04-09T13:53:40 1744206820

How are you using this data right now ? If you plan to use Coroot for visualization why not to convert it to more efficient format Coroot uses ?

nikolay_sivko · 2025-04-08T20:37:37 1744144657

At Coroot, we solve the same problem, but in a slightly different way. The traffic source is always a container (Kubernetes pod, systemd slice, etc.). The destination is initially identified as an IP:PORT pair, which, in the case of Kubernetes services, is often not the final destination. To address this, our agent also determines the actual destination by accessing the conntrack table at the eBPF level. Then, at the UI level, we match the actual destination with metadata about TCP listening sockets, effectively converting raw connections into container-to-container communications.

The agent repo: https://github.com/coroot/coroot-node-agent

nikolay_sivko · 2025-04-08T20:23:00 1744143780

Coroot builds a model of each system, allowing it to traverse the dependency graph and identify correlations between metrics. On top of that, we're experimenting with LLMs for summarization — here are a few examples: https://oopsdb.coroot.com/failures/cpu-noisy-neighbor/

esafak · 2025-04-08T21:12:14 1744146734

That looks like a built-in feature. I'm asking about extensibility. How do we use custom metrics transformations (libraries), for example?

nikolay_sivko · 2025-04-08T21:18:29 1744147109

Currently, you can define custom SLIs (Service Level Indicators, such as service latency or error rate) for each service using PromQL queries. In the future, you'll be able to define custom metrics for each application, including explanations of their meaning, so they can be leveraged in Root Cause Analysis

nikolay_sivko · 2025-04-08T20:20:26 1744143626

It only requires a modern Linux kernel. Note: The agent does not support Docker-in-Docker environments, such as KinD or Minikube (D-in-D plugin).

nikolay_sivko · 2025-04-08T19:56:24 1744142184

(I'm a co-founder). At Coroot, we're strong believers in open source, especially when it comes to observability. Agents often require significant privileges, and the cost of switching solutions is high, so being open source is the only way to provide real guarantees for businesses.