If one query matches multiple results, you can observe which result users tap and sort that one to the top. Or, if the user types in a query, gets no results, then types in a similar query and selects a result, you know that the first is likely to have been a misspelling/alternate spelling. If lots of users type in a query and don't get results, then that query can be flagged for further investigation. And so on.
All of this in addition to users reporting erroneous results.
And just entering the address/location queries (whether they succeed or not, and whether the user's own satisfaction is implicit or explicit) gives Apple bulk who-queries-what-from-where data that can be manually reviewed or automated-tested in followup processes.
For example, they know how dense their geo-data is; where does the volume of queries most exceed the existing density of data? Which queries actually result in someone following instructions to their terminus, and then not requerying again for a while?
Also: while Apple doesn't have a toolbar reporting every query and click into Google Maps -- as with the Google-Bing search results dustup of February 2011 -- they could be using their own automated process to trickle queries into Google Maps APIs and highlight major discrepancies with their own answers, for engineer/data-entry attention.
All of this in addition to users reporting erroneous results.