Hacker News new | past | comments | ask | show | jobs | submit login

> I send about 100GB a day to Amazon ES

This is why you haven't noticed any issues.




Are you insinuating that's a small amount of data? You've got to be kidding.


I've seen devs toss crud into infra with debug logs enabled, with millions of lines of deprecated log messages, etc., and the infra budget eats their costs.

It's insane. Unless you're literally Facebook, or ingesting data from CERN's LHC... what possible use case requires 100GB of text data ingest per day?

Maybe it's a case of someone throwing a Service Mesh into a Microservices K8s cluster and logging all the things?


4-5MB/min per VM for input/output traffic in compressed logs of application servers. Around 100ish VMs/site. That's a half GB/min. 700GB+/day in plain text logs per a single site from app servers alone.

Normally that's no issue as the data is stored in SANs and not sent onto the cloud for analysis, just giving a perspective.


It's absolutely caused by exactly those things you've mentioned. I think we could drop it down by 75% easily if we simply had people putting severity levels in correctly and disabled storing debug logs except in experimental environments.

90%+ of our logs are severity INFO or have no severity at all. It's like pulling teeth to even get devs to output logs using the corporate standard json-per-line format with mandatory fields.

Still, once you're running hundreds of VMs processing a big data pipeline it's not hard to end up with massive amounts of logs. It's not just logging, really, it's also metrics and trace information.


Was at a small startup, SAAS company, handful of large customers. Logs going into ES were north of 1TB/day.


I’ve run logging infrastructure for a shopping platform, low 4 digit machine count. Ingest rates were around a handful terabytes a day.


It is, relative to the scale AWS ES is apparently built to support.

> Amazon Elasticsearch Service lets you store up to 3 PB of data in a single cluster, enabling you to run large log analytics workloads via a single Kibana interface.

https://aws.amazon.com/elasticsearch-service/


That's the absolute maximum case though, literally.

"You're not the DoD so of course you're not having issues".

It can't be more than a minuscule fraction of Amazon ES customers that are getting anywhere close to 3PB.


Yes, that's a small amount of data. I've worked at small companies with an order of magnitude more data per day and larger companies with three orders of magnitude more data per day flowing into a text indexing service.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: