> I send about 100GB a day to Amazon ES This is why you haven't noticed any issu...

tinco · on Jan 21, 2021

Are you insinuating that's a small amount of data? You've got to be kidding.

geerlingguy · on Jan 21, 2021

I've seen devs toss crud into infra with debug logs enabled, with millions of lines of deprecated log messages, etc., and the infra budget eats their costs.

It's insane. Unless you're literally Facebook, or ingesting data from CERN's LHC... what possible use case requires 100GB of text data ingest per day?

Maybe it's a case of someone throwing a Service Mesh into a Microservices K8s cluster and logging all the things?

xxs · on Jan 22, 2021

4-5MB/min per VM for input/output traffic in compressed logs of application servers. Around 100ish VMs/site. That's a half GB/min. 700GB+/day in plain text logs per a single site from app servers alone.

Normally that's no issue as the data is stored in SANs and not sent onto the cloud for analysis, just giving a perspective.

core-questions · on Jan 21, 2021

It's absolutely caused by exactly those things you've mentioned. I think we could drop it down by 75% easily if we simply had people putting severity levels in correctly and disabled storing debug logs except in experimental environments.

90%+ of our logs are severity INFO or have no severity at all. It's like pulling teeth to even get devs to output logs using the corporate standard json-per-line format with mandatory fields.

Still, once you're running hundreds of VMs processing a big data pipeline it's not hard to end up with massive amounts of logs. It's not just logging, really, it's also metrics and trace information.

halbritt · on Jan 22, 2021

Was at a small startup, SAAS company, handful of large customers. Logs going into ES were north of 1TB/day.

Xylakant · on Jan 22, 2021

I’ve run logging infrastructure for a shopping platform, low 4 digit machine count. Ingest rates were around a handful terabytes a day.

mcintyre1994 · on Jan 21, 2021

It is, relative to the scale AWS ES is apparently built to support.

> Amazon Elasticsearch Service lets you store up to 3 PB of data in a single cluster, enabling you to run large log analytics workloads via a single Kibana interface.

https://aws.amazon.com/elasticsearch-service/

fastball · on Jan 21, 2021

That's the absolute maximum case though, literally.

"You're not the DoD so of course you're not having issues".

It can't be more than a minuscule fraction of Amazon ES customers that are getting anywhere close to 3PB.

halbritt · on Jan 22, 2021

Yes, that's a small amount of data. I've worked at small companies with an order of magnitude more data per day and larger companies with three orders of magnitude more data per day flowing into a text indexing service.