Many of the comments in this thread seem to not have anything to do with the content of the article, or the researchers' blog post cited within the article. There also seem to be some mistaken assumptions about the purpose of this AI.
Here is what the model is actually (attempting) to do:
>It calls attention to questionable citations, allowing human editors to evaluate the cases most likely to be flawed without having to sift through thousands of properly cited statements. If a citation seems irrelevant, our model will suggest a more applicable source, even pointing to the specific passage that supports the claim. Eventually, our goal is to build a platform to help Wikipedia editors systematically spot citation issues and quickly fix the citation or correct the content of the corresponding article at scale.
Even a pretty poor model (and it probably will be) would be a huge boon to editors in an encyclopedia full of references which are missing, out of date, a bit stretchy or plain bad faith. Wikipedia already has bots on it doing grunt work, and plenty of editors that like doing grunt work of checking minor details more than writing new content.
It's just one of the low hanging AI fruits we can pick now, nothing unexpected.
We're doing similar things to find errors in our regular ML datasets - a large proportion of the examples the model can't predict are mislabeled. Those mislabeled examples have a big penalty on performance. Since Wikipedia is often used in ML, it was time to clean it up.
The article states the training set included 134 million only webpage/URL citations.
Since it is open-source, I'd imagine some researchers with computer vision expertise and skilled with document/text digitization could greatly enhance this AI by digitizing these old books and including them in the training.
You know how when you click a link and find that its a PDF you suddenly trust it as credible science. The AI is going to learn some superficial-yet-amusingly-accurate heuristic similar to that.
There was a IIRC German politician who was part of a prank involving Wikipedia, someone claimed in his Wikipedia article that he had 200+ names. Lazy news writers quoted the Wikipedia article as a "fun fact" segment, and the Wikipedia article then turned around and said "here's an authoritative source on that 200+ names..."
> Many of the comments in this thread seem to not have anything to do with the content of the article, or the researchers' blog post cited within the article.
I know, right? The halo effect is strong in this thread.
HN on FAANG-related threads may as well be a default sub on Reddit. Practically guaranteed to get a heaping dose of childish, manic hysteria instead of a basic understanding of the article's content.
Ive got no love for FB or "fact-checking", but I knew much of this thread would be a dumpster fire.
By very amusing coincidence, one of my favorite YouTube subject matter experts posted a video on this exact problem (questionable citations) less than 24 hour ago - https://www.youtube.com/watch?v=_1HwANCSJMU
(The video's context is the sinking of the German battleship Bismark. Whether or not you've any interest in that subject, the video can still give you a far better sense of the problem than Singularly Hub's article.)
I would like this to be available as a search engine too, so I can input an idea and get relevant citations for it. This would help explore prior work in any field.
Wow, I wonder if this can also be applied to academic papers. Not for detecting plagiarism (there are already tools for that), but for 'improper' (?) citation
Most of my life I have been told: "do not rely on Wikipedia, it is inaccurate". And I get it, it's true if you're going through academia...
But. Compare to the rest of the internet. Compare to every single propaganda website. Text on Wikipedia has one of the highest chances of being true by default. If a random website contradicts Wikipedia one shouldn't trust that website.
I'm sick of people comparing Wikipedia to peer reviewed journals... When instead people get their knowledge from tabloids, random newspapers, individuals getting mad in youtube videos, and websites like Facebook.
If Facebook claims to know more about the world than Wikipedia, it should factcheck itself.
Anyone who says you shouldn't rely on wikipedia because it's inaccurate is confused about what wikipedia is, and what encyclopedias are supposed to be used for.
The lesson teachers should have been teaching was that you shouldn't cite the thing in a scholarly work as if it was a primary (or even secondary) source. Just like you shouldn't cite the Encyclopedia Britannica, or any other encyclopedia, in a paper.
You seem to be falsely equating the Wiki model, with the Encyclopedia model. Encyclopedias could be, and regularly were, referenced as factual and primary sources. The reason is that encyclopedias in the past were predominately written by subject matter experts. As an example writers for Britannica included no less than Milton Friedman, Carl Sagan, Albert Einstein, Marie Curie, and even Leon Trotsky to go in a different direction. [1]
Wikipedia, by contrast, is written by whoever can engage in the Wiki-world equivalent of getting the most likes. And its writers are whoever wins what is effectively a glorified Reddit-like upvote system. Ostensibly this is balanced against the fact that editors cannot themselves work as primary sources on Wikipedia: Einstein himself would not be allowed to write about physics on Wikipedia, but this also means Joe Editor also can't write as a primary source.
Instead everything is exclusively sourced from other sites. The problem is that this is where bias comes into play. If Joe Editor wants to publish that 2+2=5 in the mathematics section and he can find an article on The Verge claiming so, then he can do so. If somebody wants to correct him, and he doesn't want to change, then edit wars begin. And in these cases, the truth itself is secondary to more normal aspects of social media. This is where Wikipedia fails.
I'll just address just one of your points for now:
For as long as possible people held off any kind of voting, instead using a rough consensus system. This was chosen because voting would lead to too much inaccuracy.
I know about this because at one point I researched how and why the system was working, and helped write some of the documentation on it.
My question to you is where, when, and/or why you think the system looks like reddit voting, as you claim? I would be rather sad if it has broken down of late.
note 1: Ironically, the No Original Research [3] rule came about to counter claims of unreliability. I doagree with you that -while it does set a lower limit on reliability- it also sets an upper limit. Besides your objections, it also removes a channel for establishing priority [1], and makes it harder for experts to contribute from memory. [2].
note 2: Wikipedia actually incorporates the 11th edition of Encyclopedia Britannica as the basis for a number of articles [4]
I don't deny that intentional or unintentional misinformation occurs on Wikipedia. However, that does not qualify as an argument against Wikipedia as a reliable source as a whole, in particular in comparison to most other sources, including many TV channels, newspapers and tabloids. In fact, many if not most Wikipedia pages are written by subject matter experts too (to be defined weet that that means).
In spite of occasional quality concerns, Wikipedia is one of the most trustworthy sources on the internet. That does not mean you should blindly believe everything you read there, as for any other source. Also, as stated by others, it is not a scholarly source nor a scientific publication.
Which is about as bad. This can directly translate to "the only person that put effort into writing it, no matter how competent they are in the subject".
I’m a wikipedia contributor (so I’m quite familiar with how wikipedia works), and I wasn’t falsely equating the wiki model, I just wasn’t talking about it.
It is as they say "perfect being the enemy of good". There are just too many people who create this perception because of their one off bad experience. Given the Wikipedia model of open edits, they should go fix it rather than create a perception of it being bad.
Of course edit may not be possible in contentious subjects but most of us know that.
>If Facebook claims to know more about the world than Wikipedia, it should factcheck itself.
This seems like a misunderstanding of what the AI is doing. It does not "know about the world" or assert knowledge of any kind of truth, other than to check a citation given in a wikipedia article, and attempt to verify the reference actually contains said information.
For example, if I edited a wiki article to state "over 12 million people visit Hacker News every hour" and linked a random HN article claiming nothing of the sort, this AI would attempt to parse my citation and if it was successful determine the reference didn't support the claim.
I understand what the AI model claims to want to do. And I can see how Wikipedia can be a decent testbed for the model.
But the title "factcheck wikipedia" is not what the model does. At best the model would make bad actors game references the same way people game page rank on google. Sure, deterrent, meh.
"Reference check" is a better way to describe what it does. Under the hood I think it does the classical natural language inference task. There are benchmarks, datasets and papers on this topic.
Pretty much any article on gender claims it's a social construct using citations from "sources" like Mother Jones and fringe extremist social sciences professors. Biology is barely a consideration.
If you look at articles on IQ, the entire page claims any link between IQ and race, which has been repeatedly proven in countless independent studies over decades, is wrong and that all IQ differences are environment, a lie that seems blatantly political.
These pages are locked to anyone except the usual few Wikipedia elites with an agenda.
I am interested in philosophy. Wikipedia’s articles on the topic are very uneven - some are okay, others are really bad.
There are two freely accessible online philosophy encyclopaedias written by actual philosophy academics - Stanford Encyclopedia of Philosophy and Internet Encyclopedia of Philosophy (IEP) - both are miles better than Wikipedia. (IEP tends to be more accessible for beginners - some of Stanford’s articles can get quite esoteric and technical.) Another very good philosophy encyclopedia is Routledge’s, albeit it is not freely available online (or at least, not legally). Why rely on Wikipedia when there are much more reliable and higher quality alternatives?
In the context of debates, the person who needed to resort to saying "Wikipedia is an unreliable resource" misses the point that Wikipedia is still orders of magnitude more reliable than the claims of some random debate person.
From the article and the cited Meta blog post, the AI is trained on millions of wiki citations. As for what it's attempting to do:
>Automated tools can help identify gibberish or statements that lack citations, but helping human editors determine whether a source actually backs up a claim is a much more complex task — one that requires an AI system’s depth of understanding and analysis.
>we’ve developed the first model capable of automatically scanning hundreds of thousands of citations at once to check whether they truly support the corresponding claims.
I suppose it helps that truth is self-consistent? You could write articles explaining a flat earth model but that would introduce many ad-hoc arguments that violate other observations and confirmed physics phenomena.
People who are woke often think that they are good, and therefore their feelings and opinions are objectively correct. It follows that anyone who disagrees is wrong, and therefore bad. It’s a dangerous ideology because it discounts the fact that we all have a capacity for good and evil. No one is good all the time, and certainly no one is good without intense introspection.
Looking for accurate historical or political information from any single source is going to lead to inaccuracies. Every single one has an interest in promoting a certain view of both historic and current events. Usually math or science facts are wrong on Wikipedia simply due to human error. Unless they somehow have bearing on a historic or current event that is contentious. In that case you are back to being unable trust any single source without corroboration wikipedia or otherwise.
Fact-check is already a sullied overloaded term. It would be better replaced and served by something like “citation-check”.
There are a panoply of cited sources (“facts”) that needs to be properly vetted (“aligned”) to contribute toward its premise (“fact”).
This is why Wikipedia can often lay claim to being more scientific (through sheer column of citations containing of “facts”) assembled by its editors (citation scientists). This is also why many educators teach their students not to cite “Wikipedia” which is (a poor attempt?) to indoctrinate the students into learning how to root out the misleading source (often mistaken as “fact”.)
Mmmm, but it’s SCIENCE! Doesn’t necessarily means it’s a fact.
Meta (“Facebook”) would be venturesome to claim the science of citiogenesis as there are money, prestige, and power to be gained through shaping “science” of these citations. That is, by using artificial intelligence (AI).
Arguably, today’s Fact-checkers would try to use a science process that rarely achieves its “factual” (but really called a premise) claim … in a clean and unarbitrary manner while free of bias: always with unnecessary fillers with a goal to sway the readers with cemented anchors to keep it away from their basic but unwanted “fact”. We call those “fact-checkers” an opinionated citation checkers; they save the readers from doing the work by its artificial power of singular analysis through the curation of its premise (“fact”).
Fact-checkers are basically wannabe- citiogenesis scientists that are just merely interpreting their point of views. And readers (students) who failed their educators’ lesson would claim it as “fact”.
AI may or may not help and into both directions toward and away from their desired premise.
How many different algorithms would be used to dislodge this badly-abused citing of these singular analysis efforts by seemingly “fact-checkers”?
Who would be in control of this Machine Learning of AI? Meta (Facebook)!
Who oversees these AI algorithms? Meta!
And who would be the one that watches the watchers? Meta?
And will today’s educators teach these future generation of discerning readers on the much needed distinguishing between citation checkersd vs. today’s “fact-checkers”? (*cricket*)
Have you ever opened an encyclopedia? For some perspective, the Britannica, Americana, and Collier's together don't even contain 1% of what wikipedia contains.
It’s not clear to me on what basis one could make that claim, and as a subjective, empirical observation, I disagree: that has not been my impression or experience at all.
More in their line to build a fact-checker on the sort of Metaverse nonsense Zuck seems to be heavily invested in.
The only reason for doing something like this is to ultimately subvert the Wikimedia editors, setting up Factbook/Meta as the sole arbiter of what's correct and true on Wikipedia.
Reading through it, I strongly disagree with FB's example for "Better citations in action".
I don't see an improvement in the wording and IMO they would be making it worse by switching from an official first party source to a third party one.
While their tool might be useful to find semantic (mis)matches, a much more important part of verifying citations is to verify that the source has any business to make claims about the matter in the first place. https://xkcd.com/978/
QUOTE
Better citations in action
Usually, to develop models like this, the input might be just a sentence or two. We trained our models with complicated statements from Wikipedia, accompanied by full websites that may or may not support the claims. As a result, our models have achieved a leap in performance in terms of detecting the accuracy of citations. For example, our system found a better source for a citation in the Wikipedia article “2017 in Classical Music.” The claim reads:
“The Los Angeles Philharmonic announces the appointment of Simon Woods as its next president and chief executive officer, effective 22 January 2018.”
The current Wikipedia footnote for this statement links to a press release from the Dallas Symphony Association announcing the appointment of its new president and CEO, also effective January 22, 2018. Despite their similarities, our evidence-ranking model deduced that the press release was not relevant to the claim. Our AI indices suggested another possible source, a blog post on the website Violinist.com, which notes,
“On Thursday Los Angeles Philharmonic announced the appointment of Simon Woods as its new Chief Executive Director, effective Jan. 22, 2018.”
The evidence-ranking model then correctly concluded that this was more relevant than Wikipedia’s existing citation for the claim.
/QUOTE
>I strongly disagree with FB's example for "Better citations in action".
I think you misread something. The article that was originally cited is not relevant. It was about someone else becoming CEO for a different organization.
You're right, my mistake. Happy to see though that the current Wikipedia page cites an archived version of a first-hand source though rather than FB's suggestion
I can't help but think of The Onion's take on Wikipedia [1].
It's more accurate to say that this AI is fact-checking citations. There are lots of ways you can skew or fabircate information on Wikipedia. One well known way is to create some source for a claim and then cite it on Wikipedia. What will an AI do in this case?
3-4 years ago I stood in Menlo Park when Mark Zuckerberg got up and announced in response to the misinformation issues of the 2016 election that an effort would be made to fact check articles. My immediate thought, which hasn't changed, is "that's never going to work". You will always find edge cases where reasonable people would disagree but that's not even the big problem.
The big problem is that there are lots of people who aren't the slightest bit interested in the "truth". I've heard it say that whatever ridiculous claim you want to fabricate, you can find 30% of Americans who will believe it. As soon as you start trying to label content as truthful or not, you won't change the minds of most people. For many you will be contradicting their world view and you'll simply be dismissed as "biases" or "fake news".
I honestly don't know what the solution to this is. I do think sharing links on Facebook itself was probably a mistake for many reasons.
So this effort to fact check citations just seems more of the same doomed policy.
The solution comes from Sir Francis Bacon who said:
“Read not to contradict and confute; nor to believe and take for granted; nor to find talk and discourse; but to weigh and consider. Some books are to be tasted, others to be swallowed, and some few to be chewed and digested: that is, some books are to be read only in parts, others to be read, but not curiously, and some few to be read wholly, and with diligence and attention.”
Look i understand that ai as a tech will evolve and become better, but right now when i hear something switches to ai i immediately assume low quality and bugs. Brace for false positives and an overall average experience.
Not scary at all that the company that censored news articles at the behest of the FBI in order to swing an election is going to do even more "fact checking".
(And, related questions, like "Who trolls the trolls?", "Who thought polices the thought police?", "Who propagandizes the propagandists?", "Who spams the spammers?", etc., etc.! <g>)
For now as far as I know, bots are not allowed to make edits to Wikipedia. This could only ever be used to make suggestions to a human editor. If Mark wanted to make targeted changes to Wikipedia, there would be easier ways to do it than this.
For edits that require judgement, a common mode is to use semi-automated tools where there is still a Human-In-The-Loop.
(edit) And if you look at their demo, the latter is exactly how it works. https://verifier.sideeditor.com/ . With a human in the loop like this, this could be a powerful tool.
Fortunately I got a 2 year old clone of wiki on a zim file. 2.6 GB isn't all that much and it has pictures.
I don't think I will trust wiki ever again if they do this. It is already like an encylopedia made in bar bathroom graphiti. Now they want to turn it into 4chan.
>>>It is already like an encylopedia made in bar bathroom graphiti. Now they want to turn it into 4chan.
Are you familiar with Encyclopedia Dramatica? It's basically Wiki for Internet Culture, if all of Wiki's editors where autistic drama merchants from 4chan. It's glorious: https://encyclopediadramatica.online/Category:Weeaboos
An invaluable tool for Internet culture! I loved comparing the different definitions for explicit or semi-explicit terms like "thot", "coal burner", etc..
One the one hand, good on them for tryin' to do somethin' properly innovative with "AI", but on the other hand, AHAHAHAHAHAHAHAAHAHAHAHAH! Meta "fact-checking"? Hahahahahah!
Fact is pain and toil which AI will never understand. It can never understand what it means to be a people hunted down and exterminated. It means pain and toil.
Actually the image of the beast is removed a long time from now. Not seen it yet and not sure I will live to see it but it isn't hard to see what it would be.
It’s taking subtle human judgement out of decision making.
Some systems don’t need subtle social decision making. Using AI for those systems is great. But a lot do. And in those that do, if you’re not careful and automate too much, errors compound. Quickly.
AI recommendation engines have already ripped society apart and increased division, because the subtle bridge building that human curation enables is removed.
Hospital systems have been being increasingly systematized and are becoming increasingly hellish and expensive. They’re a non AI example of the same phenomenon. The risks are in trying to systematizes things with a social aspect.
I love automation and think we should strive to free up as much time for exploration, creativity, and human connection as we can. But AI is just plain bad at understanding us, and always will be, in my view. It will always be a mimic. It simply cannot do the subtle social things humans do because it does not have the motivations that enable that subtle behavior.
There has to a term that refers to comments like the above - simple one liners that at first might seem true; they appeal to a basal irrational nature within us that has an impulsive reaction, as if a pointed jab. But the more you think about them, the less sense it makes.
The comment above is nonsense; dangerous at worst, idiotic at best. There's no comparison between AI and writing.
And then what is it going to do once it "grades" the information pulled from sources?
Edit the article? I don't expect it to be accurate enough to avoid getting quickly banned or auto-reverted.
Raise some issue for volunteers to manually review? Probably not accurate or important enough to be a priority given the likely volume.
Honestly a good portion of the sources I check out in a wikipedia article just 404 or have become paywalled, and that would be pretty trivial for a bot to detect, so there's obviously not a huge desire to have bots checking sources in the first place.
If they have a machine which is good at identifying suspicious information, on a huge scale, they may also be able to control which information is suspicious, to a limited extent. It could easily be part of an effective apparatus of repression.
There's nothing in the article about a machine which identifies suspicious information, or information control, or an apparatus of repression. Perhaps you're referring to some other thread?
There is however, an open-source AI, and a github link to it. If you're familiar with Python you can literally read the code yourself and look for anything nefarious.
I'm not religious anymore but this feel applicable:
> Thou hypocrite, first cast out the beam out of thine own eye; and then shalt thou see clearly to cast out the mote out of thy brother's eye. - Matthew 7:5
(Sorry, I grew up on KJV, it what I remember and what sounds right in my head)
Here is what the model is actually (attempting) to do:
>It calls attention to questionable citations, allowing human editors to evaluate the cases most likely to be flawed without having to sift through thousands of properly cited statements. If a citation seems irrelevant, our model will suggest a more applicable source, even pointing to the specific passage that supports the claim. Eventually, our goal is to build a platform to help Wikipedia editors systematically spot citation issues and quickly fix the citation or correct the content of the corresponding article at scale.