https://arxiv.org/abs/2301.11886 for storage and CDN caches achieves pretty remarkable improvements over standard heuristics by using common heuristics as a filter for ML decisions. I imagine even TLB caches might achieve some benefit from ML models with batched pre-prediction as done in MAT, but L1-3 are almost certainly too high throughout.