From what I've read 4090 should blow A100 away if you can fit within 22GB VRAM, which a 7B model should comfortably.
And the latency (along with variability and availability) on OpenAI API is terrible because of the load they are getting.