> Bounded to...a model trained on virtually all publicly available text ever generated by humans
Don't forget there's a lot of non-public data too!
I don't disagree, but my point is that some bound is better than no bound. I think we can certainly agree that there are even better bounds than others. Obviously we won't ever have a full equal comparison, but I think the bounds do allow for some insights to be gained. We just need to be cautious that those insights consider those bounds (I believe we both are cautious about what insights can be gained. If you doubt me, see my other comments. I do push back on OP pretty hard)
> unless they're turning up data from after the model was trained.
Only under the condition that the models perform lossless compression of all data trained on. If the compression is lossy, then search will reduce that loss.
I don't disagree, but my point is that some bound is better than no bound. I think we can certainly agree that there are even better bounds than others. Obviously we won't ever have a full equal comparison, but I think the bounds do allow for some insights to be gained. We just need to be cautious that those insights consider those bounds (I believe we both are cautious about what insights can be gained. If you doubt me, see my other comments. I do push back on OP pretty hard)
Only under the condition that the models perform lossless compression of all data trained on. If the compression is lossy, then search will reduce that loss.