"You may not use the AI services, or data from the AI services, to create, train, or improve (directly or indirectly) any other AI service" - It's not like anyone chose to indirectly train these existing systems - this means that anything published online that could be scraped and used to train something is not allowed right?
It wouldn't be an online service's T&C document if it didn't include at least one vague, threatening, unworkable, and unenforceable condition.
The useless but true answer is nobody knows what's allowed and what isn't, until it's tested in court. Practically (not being a lawyer, though) I suspect that the clause will never be pursued on its own, because it's bullshit and everyone involved knows it is so.
In your scenario, though, assuming you publish in a way that's not overtly and primarily meant for AI training, I think the "use" of data isn't yours and would be hard to argue as violating the terms of the agreement.
Of course we might take it to the absurd end of this line of reasoning and demand that any code base that Copilot was involved in should have a license term preventing the training of any other AI in it, and we wind up in a place where all AIs are trained on source material they're explicitly licensed not to be trained on, or trained only on a mostly static set of "pre-AI" publications.