It should exist and would be super powerful considering all the recent advancements in language ML. Here was the mental model of my model: the canonical representation (i.e. a representation after taking out run-time populated fields) of a log line represents the smallest meaningful unit of this "log language": _a word_. Taking this analogy further, an event is a collection of logs that occur together (mostly in order)—just like words spoken together form _a sentence_. Finally, collections of events that occur in close proximity (in time) represent _paragraphs_, while paragraphs occurring in a certain order constitute _chapters_. Using this mental model opens the door to apply all the new AI techniques for text extraction, summarization and generation to extract the semantic structure of any "log language" and then learn and classify behaviors observed at run-time. The eventual objective function is not generation though--it's reasoning with the optimal FP-TP tradeoff on a ROC curve.
I haven't seen anyone do it yet. Maybe companies like Splunk and Elastic will take a lead here. I am happy to engage, advise and contribute if there is an open source project around this.
Has anyone else seen something remotely close to this?
I haven't seen anyone do it yet. Maybe companies like Splunk and Elastic will take a lead here. I am happy to engage, advise and contribute if there is an open source project around this. Has anyone else seen something remotely close to this?