Hacker News new | past | comments | ask | show | jobs | submit | thewisenerd's comments login


deleting data, has a cost.

deleting data early after moving it to cold storage, has additional costs.


> having a decent way to get metrics from logs ad-hoc completely solves the metric cardinality explosion.

last i checked, the span metrics connector[1] was supposed to "solve" this in otel; but i'm not particularly inclined, as configurations are fixed.

any data analytics platform worth it's money should be able to do this at runtime (for specified data volume constraints, in reasonable time).

in general, structured logging should also help with this; as much as i love regex, i do not think extracting "data" from raw logs is lossless.

[1] https://github.com/open-telemetry/opentelemetry-collector-co...


> observing it will drastically impact the event

this presumes 'metrics' are 'cheaper' than 'traces' / observability 2.0 from a setup standpoint; purely from an implementation perspective?


Wide events seems like it would require more memory and CPU to combine and more bandwidth due to size.

I've implemented services with loggers that gather data and statistics and write out just one combined log line at the end. It's certainly more economical in regard to dev time, not sure how "one large" compares to "many small" in reality resource-wise.


i mean.. from your blog post linked in the repo; this isn't eBPF based?

https://clickhouse.com/blog/kubenetmon-open-sourced

the data collection method says: "conntrack with nf_conntrack_acct"


so they didn't want to pay for AWS CloudWatch [1]; decided to roll their in-house network flow log collection; and had to re-implement attribution?

i wonder how many hundreds of thousands of dollars network flow logs cost them; obviously at some point it is going to be cheaper to re-implement monitoring in-house.

[1]: https://youtu.be/8C9xNVYbCVk?feature=shared&t=1685


Because vanilla flowlogs that you get from VPC/TGW are nearly useless outside the most basic use cases. All you get is how many bytes and which tcp flags were seen per connection per 10 minutes. Then you need to attribute ip addresses to actual resources yourself separately, which isn't simple when you have containers or k8s service networking.

Doing it with eBPF on end hosts you can get the same data, but you can attribute it directly as you know which container it originates from, snoop dns, then you can get extremely useful metrics like per tcp connection ack delay and retransmissions, etc.

AWS recently released Cloudwatch Network Monitoring that also uses an agent with eBPF, but its almost like a children's toy compared to something like Datadog NPM. I was working on a solution similar to Netflix's when NPM was released, was no point after that.


This is spot on. The AWS logs can also be orders of magnitude more expensive.


I recall a time when we managed network flows by manually parsing /proc/net/tcp and correlating PIDs with netstat outputs. eBPF? Sounds like a fancy way to avoid good old-fashioned elbow grease.


good to know pub-sub shenanigans are ubiquitous lol

here's my implementation from a while back with `setTimeout` like semantics; used it to avoid prop-drilling in an internal dashboard (sue me)

https://gist.github.com/thewisenerd/768db2a0046ca716e28ff14b...


    sub => ref = 0
    sub => ref = 1
    unsub(0)
    sub => ref = 1 (two subs with same ref!)


there's also similarly limits on width of elements, which I found out recently when trying to set a width to "45678910px" [1]

[1] https://thewisenerd.com/works/45678910px.html


this builds upon PEP 723, which is "accepted", so it's likely here to stay.

https://peps.python.org/pep-0723/

I've been very slowly migrating scripts to work with this, and `pipx run`. glad to know uv has also picked it up.


as a former record holder (1m18s, 2015) it's been fascinating to see this number go down over the years


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: