LLM chat is so compute heavy and not bandwidth heavy that anywhere with reliable fiber and cheap electricity is suitable. Ping is lower than average keystroke delay for most who haven't undergone explicit speed typing training (we're talking 60~120 WPM for between intercontinental to pathological (other end of the world) servers).
Bandwidth matters a bit more for multimodal interaction, but it's still rather minor.
Disagree on Nvidia, most folks fine-tune model. Proof: there are about 20k models in huggingface derived from llama 2, all of them trained on Nvidia GPUs.
I suppose it could be hallucinations about itself.
I suppose it's perfectly fair for large language models not necessarily to know these things, but as far as manual fine tuning, I think it would be reasonable to build models that are capable of answering questions about which model they are, their training date, their number of training parameters, and how they are different from other models, etc. Seems like it would be helpful for it to know and not have to try to do its best guess and potentially hallucinate. Although in my experience Llama 3 seemed to know what it was, but generally speaking it seems like this is not necessarily always the case.