To reinforce this point, we have known for a very long time that better UX and data quality has innumerable benefits. Remember back when everybody was excited about web 2.0 and all the amazing mashups which would soon become possible?
If you are producing data then exposing it in a nice programmable format is an extra cost and generally provides you no benefit. It usually hurts you, if people stop visiting your site and see fewer of your ads!
This is "really" a problem of incentives. It is usually not possible to capture any of the positive externalities of exposing your data. So maybe we could convince everybody in the world to switch to using different browsers with a native micropayment system; that might incentivize everybody to release all data as clean machine-readable tuples.
What I'm saying is, the phrase "Better UX and data quality" ignores just how hard that solution really is. It turns out training an LLM over most of the internet is _easier_ than global coordination.
But chatgpt can crawl a semantic web and use it without us knowing.
I have asked a langchain bot about wikidata ids for specific places, links to the page, to read it and then to answer facts about places and got very good results instead of made up numbers.
Wikidata links to FIPS codes, OSM ids, GeoNames and that gives us an opening to link against the cool datasets from Flickr, Foursquare and others who have created gazetteers.
To me, Semantic Web was dead on arrival because of its UX, but now a semi-smart agent can help us get past the UX problems and jump from plain text to json output.
If you are producing data then exposing it in a nice programmable format is an extra cost and generally provides you no benefit. It usually hurts you, if people stop visiting your site and see fewer of your ads!
This is "really" a problem of incentives. It is usually not possible to capture any of the positive externalities of exposing your data. So maybe we could convince everybody in the world to switch to using different browsers with a native micropayment system; that might incentivize everybody to release all data as clean machine-readable tuples.
What I'm saying is, the phrase "Better UX and data quality" ignores just how hard that solution really is. It turns out training an LLM over most of the internet is _easier_ than global coordination.