They embed biases from the training data, which is taken from the internet at large. The models themselves aren't inherently biased. They're just trying to generate the next token or scene. And these models aren't from the 50s, or made by researchers in the 1950s. The models have guardrails added to try and prevent bias (and other deemed harmful content) being generated.
None of these data sets are based on "the Internet". They are a specific subset of the internet, and the training, reinforcements, and guardrails are in no way neutral (because that's not a thing).