A friend and I had started a project similar to this some 6-7 years ago. We wanted to make all the generated data consistent so if, for example, you need data about German person it would return real city name, real (or at least plausible) street name, postal number, phone numbers both mobile and fixed with the proper country and operators code, email address using national domain, personal name was chosen out of plausible names for each country etc. etc. In the end it was so much work that we abandoned it :(
I have a similar project that generates random SQL rows, and yes, getting everything consistent is a huge pain. Mine generates city/country pairs that match, and has a limited set of countries that it will also generate matching phone numbers for. It ignores area/operator code differences.
Email addresses, it uses first.last@$RANDOM.com, no country-specific.
The other struggle I have is that to properly test a DB, you need millions of rows, not thousands. Mine does quite well up through a few million, but then starts struggling. I need to overhaul the generator functions to use threading. I hadn’t initially because I assumed the CPU would be too busy to context switch, but then I tried a smaller example and found I was wrong - massive speed up.