The Open Source AI Definition (OSAID) is a slap in the face to anyone who has been part of the open source community. Allowing companies to redefine "Open" to allow closed components is a complete betrayal of everything the OSI should stand for, and it was done purely so large companies can pretend their closed models are open.
To be explicit I believe your concern is the fact that they are not requiring that the training data and training methodology they used to generate the open source model be made accessible so that anyone can essentially build the model themselves from raw ingredients right? In other words imagining for a moment that folks have access to the kind of compute necessary to do that. Right?
Nevertheless giving people a building block that they can do what they want with certainly seems like free as in freedom to me. So I personally sympathize with the OSI approach but in general I'm not a big on the zealotry around the open source community.
It's almost like we have a third category here: free as in freedom but you can't necessarily rebuild it yourself.
In practice I would argue that intellectual talent has always been a hidden part of this anyway and therefore we're being intellectually dishonest to imply that this hasn't always been a de facto reality even for traditional software.
It's not just about reproducibility (although I do think that's important), it's about analysis of the model. With traditional software you have a pretty well defined "this code does this", but with machine learning models one of the only ways to validate that bias or propaganda hasn't been inserted during training.
Nobody owns their data. They just scrape the internet, or pirate massive troves of books. Just forcing companies to get a license to all the data they use, let alone an open license, would be a massive impediment to the development of open models.
It is definitely doable to get openly licensed data, you just have to do it via voluntary participation of crowdsourced data acquisition programs. For example the RNNoise model was retrained from such crowdsourced data.