Wikipedia Is Nearing Completion, in a Sense

ChuckMcM · on Oct 28, 2012

This is the other end of extrapolation. There is a lot of stuff in the world, but its finite stuff. And information, like lots of things, has its own inverse power law. When building a search engine folks say "Gee you don't have nearly the hardware that Google does, how can you ever hope to compete?" and the answer is the inverse power law. Looking at queries that are served out of the index vs being served off the long tail it drops off dramatically.

How many companies are there in the world? 100M? Give them each a a megabyte for a pictures and their information, that is 100 TB of data, 50 disk drives, 150 if they are triply replicated. And what are the net new businesses in a given year? 2% annual growth rate for the worlds economies over the long scale, that's maybe 2M new businesses a year, or 2TB of data, or 1 (or 3) new disks a year?

A billion people on Facebook? 1/7th of the world population? What a megabyte for each of them is PB of data? The Backblaze guys can put a petabyte of storage in a single cabinet.

For a long time the Internet was playing 'catch-up' now it is asymptotically approaching 'caught up.'

Different problems and different opportunities for folks now who are basing their endeavors on the net.

tisme · on Oct 28, 2012

Storing data and using data are two completely different things. You can store all of FB and lots of other companies in a fairly small volume these days. But as soon as you want to access that data to process or update it the game changes rapidly. Suddenly that one cabinet explodes into a datacenter full of cabinets, or even several data centers.

Storage is a solved problem for just about any amount that an ordinary company might need. Getting that data delivered to a CPU at speeds that are still usable in a practical sense if you want to say something about all of that data is a completely unrelated problem which changes amount of technology and funds required from the easy level to the extremely hard and beyond level.

ChuckMcM · on Oct 28, 2012

"Storing data and using data are two completely different things."

That is so true. When people ask "How hard can it really be to write a search engine these days?" I have been known to ask them to speculate on how they might go about it and then point out the challenges of knowing what data you have vs what data is asked for. Search is particularly interesting because the more time you spend the better your answer can be, and its always challenging to 'draw the line' between fast and relevant. But that is also what makes it so fun :-)

ErrantX · on Oct 29, 2012

As a highly active Wikipedian this article is rather wrong: Wikipedia is far from complete. It has a lot of articles, covering a huge range of topics (and in that sense is perhaps nearing "completion"). But those articles are almost always far from complete.

As someone who has written multiple peer reviewed "Good Articles" and one "A-Class" (a step below "Featured Article") - as well as numerous other fairly well rounded articles - the act of taking material from a stub or short overview to a complete encyclopaedic article is MASSIVE.

I wrote an article about Dudley Clarke; a British chap who was responsible for military deception planning in North Africa during the Second World War. That took nearly 3 months of research and writing, and cost me around £50 in text books.

"but the bulk of the work, the actual writing and structuring of the articles, has already been done"

No. The easy thing of slapping up and article with some information in it has been done. The bulk of the work is completing each article.

This article does actually nail why Wikipedia editorship is declining: "With the exciting work over, editors are losing interest" They just get the reason wrong. This is why we are plagued by a massive community of vandalism patrol/administrative types and those who treat it as a social network - content creation is a minor facet of Wikipedia because of the massive investment required to finish up an article.

kiba · on Oct 28, 2012

Wikipedia may be getting more mature, but I think barrier to entry is also cause for lack of outsider edits. Also, being a wikipedian isn't like being a professor. You don't get much outsider respect.

Due to their policies, you also see "forks" or more specialist wiki such as comixpedia, or libregamewiki(A wiki that I actually found) for subjects like webcomics or open source video games that keeps getting deleted.

Gwern, who is a veteran editor, got so tired of the wikipolitics(Even though he's really good at wikilawyering to protect his contribution when needed), so he started http://gwern.net

He benefit from the reputation and traffic that would otherwise goes to wikipedia.

DanBC · on Oct 28, 2012

There is less for people to do, but there is still plenty of "wiki gnoming" to do. This is exactly the kind of thing that could attract some new editors. The minor fixing of small details should be an excellent use of crowdsourcing.

I seem to have lousy experiences at Wikipedia whenever I try to edit anything, even if it's a minor edit.

Last time I tried:

My IP had already been blocked because it had been used by a vandal. I thought that was a bit odd (blocking a dynamic IP range), so I made an unblock request.

The template is confusing, and doesn't tell users to include the reason for being blocked as well as the reason for being unblocked, so I got an error message.

While I was fixing the mistake I made my unblock request was declined. (This happened within a few minutes of me posting it.) Thus, I fixed the mistake I made, saved the page, (working past the edit conflict) and find that my request has been declined.

I read why it's declined.

I make another unblock request.

I get a friendly, polite, sympathetic reply. But it's not actually much use - it doesn't help new users to understand WP. And the block remains.

I try to reply. Because the talk page includes links to templates WP thinks I'm entering external links and asks me to enter capchas.

I enter the capchas and reply. I say that blocking a troll by blocking some dynamic IPs is odd - they just log off the Internet and log back on to get a new IP address, which may be outside the blocked range. Determined trolls (the kind who attract IP range blocks) will find this trivial to do. Newbies, the kind of people who are being targeted for this retention programme, may not find this quite so easy.

What kind of person will willingly jump through these hoops? People who are great at grammar and copy editing? Or cranks who want to put homeopathy in every article, or who want to mention Armenian genocide in many articles?

ErrantX · on Oct 29, 2012

Putting my admin hat on for a moment... rangeblocks are fairly rare for the reason you cite. But when used they are deployed in situations where a vandal has been noted to be using IP addresses from the same range.

Although you seem skeptical it often works! As these are not the smartest cookies in the crumble.

Unfortunately, the only way to fix your problem is to create an account on another IP address and log in (or alternately ask someone to create it for you, a service Wikipedia can provide).

Because the talk page includes links to templates WP thinks I'm entering external links and asks me to enter capchas.

That's rather odd, I've never come across that before! I thought internal Wikipedia links were excluded from that filter. I will raise that when I find someone who is responsible for such things :)

benmanns · on Oct 28, 2012

Here is the data on a linear, rather than logarithmic scale[0]. Y axis is number of articles and X axis is "years since 2001". I copied the data as best I could from the chart given by the author of this article. Wikipedia has a page itself about its own size which shows a similar linear growth rate[1].

[0] http://www.wolframalpha.com/input/?i=plot+%7B20%2C15000%2C10...

[1] http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

ygra · on Oct 29, 2012

That's what I wondered too. It's easy to "show" a certain "saturation" when using a log scale and that only means it doesn't grow relative to its current size.

binxbolling · on Oct 28, 2012

I think this article makes a great point, but there might be other, more unpopular reasons fewer people are editing. For example, in my opinion, the barrier to entry has gotten much higher. Fanatical editors revert aggressively, red tape holds up even great changes, and the rites & mores of the Wikipedia community become ever more convoluted & impenetrable to outsiders.

It's a myth, albeit popular, that one can just read an article and quickly edit a mistake (whether major or minor). Perhaps this facade is crumbling a bit, and more people are realizing that "contributing" is not actually all that easy. I actually love Wikipedia, so don't take this as petty sniping, it's just what I've experienced.

jarek · on Oct 29, 2012

I'm actually rather curious as I see this argument repeatedly and it doesn't match my personal experience (I've corrected plenty of mistakes both with a user account and as an IP and I don't think I've been reverted once). Do you mind sharing what kind of articles or mistakes have you had this experience with?

JustSomeAnon · on Oct 29, 2012

Unfortunately I have to comment as anonymous.

My experience totally matches binxbolling. As a new contributor my entries were either blanked by trolls (and left for dead for weeks without end) or have been deleted by over-zealous editors whose only reference seems to be their own opinion and the spouting of endless WP this and WP that rules. I was appalled by the bullying and mob mentality I found in there.

jarek · on Oct 29, 2012

Can you comment on the articles or fields you were editing in, or would that jeopardize your anonymity?

JustSomeAnon · on Oct 30, 2012

The field is, broadly speaking, Computer Science.

aprescott · on Oct 28, 2012

It seems a little misleading to have a graph with log-y values suggesting that growth has stopped because it's reaching some kind of peak. log(number of articles) looking like log(t) means it might still be growing pretty linearly.

yk · on Oct 29, 2012

Wikimedia has the numbers (some scrolling required):

http://stats.wikimedia.org/EN/ChartsWikipediaEN.htm

It is recently slightly lower than linear, but by no means looking as bad as the article suggests. In fact ( reading from the plot in the Atlantic) it took Wikipedia five years (2001 - 2005) to reach 1M articles, two years for the second and roughly 1 and a half years each for the next two million. From the more detailed statistics at Wikimedia, new articles are now added at roughly 1000 a day, down from 1600 a day in 2006.

[added]I just found the linear plot of en.wikipedia articles: https://en.wikipedia.org/wiki/File:EnwikipediaArt.PNG

britta · on Oct 28, 2012

I found this response interesting from a person who works for Wikimedia (http://branch.com/b/surmounting-the-insurmountable-wikipedia...):

...at the Foundation, we tend to call this "the Gold Rush theory" - the idea that Wikipedia's gold rush days are over.

My personal view is that the problem isn't that everything is done, but that it's harder for willing new editors to find things to do. (For existing power contributors, they have no hard time finding topics, which is why overall article growth remains steady).

The answer here is that the Foundation needs to work on ways to surface interesting and useful stuff to write about. Wikipedia can't afford to hope people stumble on these topics.

He works on making Wikipedia's UI better in ways that encourage useful contributions, especially for new editors - see his blog posts at http://blog.wikimedia.org/2012/09/24/giving-new-wikipedians-... and http://blog.wikimedia.org/2012/10/24/fix-this-broken-workflo... for examples.

AustinGibbons · on Oct 28, 2012

I did some research on the declining editor rates for a course project. Without recreating our entire paper, I want to make this important point. Wikipedia is not (necessarily) nearing the completion of all knowledge, but rather what is feasible for its editors to contribute - indeed, there is much work to be done on "the history of sanitation and sewage in ancient Carthage" and "topic sensitive page rank", which at least has a page, let alone "trust sensitive page rank" which seems to be missing one. Our core piece of evidence was observing the _same_ trends across different languages (in both # edits and # page views). If you are interested, check out our final report here - it has its flaws, but I believe the three pieces of evidence we explored hold some merit in their own right. (http://www.stanford.edu/class/cs341/reports/09-GibbonsVetran...)

robryan · on Oct 29, 2012

The more specific a topic is the few people that could actually write an accessible article on it. Paying specialized writers or even paying/ asking for donations of other works they have done could fill out these areas.

jevinskie · on Oct 28, 2012

"Jensen believes that there is a way out of this: "Wikipedia is now a mature reference work with a stable organizational structure and a well-established reputation. The problem is that it is not mature in a scholarly sense." Wikipedia should devote more resources toward getting editors access to higher-quality scholarship (in private databases like JSTOR), admission to military-history conferences, and maybe even training in the field of historiography, so that they could bring the articles up to a more polished, professional standard."

I think that this is the most interesting part of the article. What if Wikipedia was funded well enough to act as both a traditional encyclopedia with full-time editors/writers and as a place where everyone can add/edit content? We need a modern Andrew Carnegie to step up.

gojomo · on Oct 28, 2012

They're not hurting for funding. The Wikimedia Foundation had a $28 million budget for its current year and is projecting $46 million in revenues for 2012-2013:

https://wikimediafoundation.org/wiki/2012-2013_Annual_Plan_Q...

Also, hiring editors/writers might crowd out the volunteer motivations, costing the project far more in the loss of donated effort. I believe Wikipedia's challenges are more strategic -- deciding the right things to do and in what proportions -- than financial.

jessriedel · on Oct 29, 2012

What makes you think $40 million is a lot? That seems like a very tiny amount to me, especially considering that most (nearly all?) is spent on technical requirements rather than editing.

gojomo · on Oct 29, 2012

I believe $40 million per year is a lot because it could, just based on crude rules-of-thumb, support a strong technical staff of well over 100 people indefinitely. Indeed, according the 2012-2013 Plan document, current Wikimedia Foundation headcount was about 119 at the time of writing and expected to grow to 174 next year.

I'm not sure what you mean by 'technical requirements', but if you specifically mean hosting costs (other than staffing) and capital expenditures on equipment, per the plan's page 69 that is forecast to be about $5.3MM of the total $42MM budget -- around 13%.

Staff is the major cost -- I believe it is chiefly technical, administrative, legal, and community/chapter outreach roles. I also believe there's a not single person with explicit 'editing' duties on their payroll, as a matter of strategy and doctrine: Wikipedia is edited by unpaid volunteers, the Foundation supports those volunteers. Adding any 'official' and compensated editors would tamper with the formula/culture that's worked so far.

In one respect, crossing that threshold by adding just a single paid editor would be 'cheap', but if that steps makes the volunteers feel less appreciated, or start thinking of things in terms of a salary (aka: 'motivation crowding'), the loss in productive contributions could be much larger.

Why do you think $40MM-per-year is a "very tiny amount", and how big of an editorial staff do you think they'd need under some new model including professionalized editors?

jessriedel · on Oct 30, 2012

There are 4 million articles on Wikipedia and 50k active editors. 100 people, even working full time, would hardly make a dent. This is true even ignoring the serious impact of tampering with the all-volunteer philosophy. I'd wager you'd need at least 10 times the number of people/money to seriously improve the content of wikipedia by hiring editors.

gojomo · on Oct 30, 2012

The definition of 'active editor' that gives totals in the tens of thousands is anything more than 5 edits a month. That could just be minutes of time, so I doubt you can draw any sensible conclusions about sizing a paid full-time staff from '5+ edit' volunteer counts.

FWIW, the Wikipedia article on Encyclopedia Britannica suggests that in 2007, Britannica had a credited staff of about 60, with another dozen editorial advisors. (While thousands of other advisors contributed over the decades, they're not required as continuing full-time staff.)

So your suggestion that the Foundation would need a "at least 10 times the number of people/money" -- 1000+ full-time paid editors, a $400 million/year+ budget?!? -- for serious improvements seems wildly extravagant. Is it extrapolated from the known size/cost of any real-world professional/academic editorial efforts?

jessriedel · on Oct 30, 2012

I'm saying that if you added the staff of Encyclopedia Britannica to Wikipedia you wouldn't make a dent in Wikipedia. Britannica is about 40 million words. Wikipedia is 2 billion words, or 50 times larger. 2% is a blip.

gojomo · on Oct 30, 2012

And Wikpedia got the 2 billion words with zero paid editors, and indeed a culture that is suspicious of financial motivations, which is a big reason why the Foundation/community doesn't seem to have any interest in paid editors.

But if they thought they could use them, it doesn't follow that they'd need a staff of a thousand-plus. It depends on the (unstated, entirely-hypothetical) strategy for using them. Superficial word-count-output extrapolations wouldn't be part of such a strategy.

They've got plenty of money (provided each year's donation campaign meets its goals). They've got plenty of words. There's just not a clear path where "if they just had enough funding from some modern Andrew Carnegie" they could throw paid editors at their mission and improve things. (That was the particular suggestion that started this tangent about whether they need more money or not.)

Other tangential evidence: while the Foundation reaches its fundraising target each year, they often spend less than planned. See for example:

https://wikimediafoundation.org/wiki/2012-2013_Annual_Plan_Q...

saurik · on Oct 29, 2012

While I agree that that would be interesting, that seems incredibly far-fetched as every one of those categories of scholarship--even including reading JSTOR to a real extent--is what Wikipedia would call "original research".

Instead, Wikipedia believes strongly that it should only base itself on secondary sources, and thereby would much prefer to cite a New York Times article over a peer-reviewed journal article (which you are technically allowed to do, but the rules severely discourage it).

The problem, therefore, isn't about helping editors get access to these more professional sources, it is making Wikipedia believe that they are even valuable sources in the first place. Until then, it simply will not happen.

(To demonstrate the issue, I have pasted below a section of the rules on the usage of primary sources on Wikipedia. The idea of sending someone to a "military-history conference" with this rule still in place, obviously would do no help.)

> A primary source may only be used on Wikipedia to make straightforward, descriptive statements of facts that can be verified by any educated person with access to the source but without further, specialized knowledge.

ErrantX · on Oct 29, 2012

A journal article would certainly count as a secondary source (unless of course you were talking about the source itself)! And is always preferred over a media report.

Now if your argument was that Wikipedia suffers from "recentism", whereby lots of topics are based on media reports and lack more in depth literature to establish historical interest, then we would be in agreement.

But the Wikipedia community is well aware of the importance of peer reviewed literature. The Medicine wiki-project, for example, has specific sourcing policies[1] for medical articles that specifies literature reviews as good sources.

In fact, Wikipedians are working to get access to professional literature. There is a resource exchange page where we help each other get access to specific journal articles. And one particular editor has been working hard to get free accounts for things like Credo reference etc.

1. http://en.wikipedia.org/wiki/WP:MEDRS

saurik · on Oct 29, 2012

I found your response highly surprising, and I see my confusion now: Wikipedia is making (I will even admit correctly) a distinction between actual research (which it very clearly calls a "primary source") and a "review paper" (which it calls a "secondary source"). [references below]

You really thereby can't say "a journal article would certainly count as a secondary source", as most journal articles are research, not reviews of research: the page you link to even very directly states "many, but not all, papers published in medical journals are primary sources".

Of course, I will totally admit that at least some of the articles in JSTOR are going to be review papers, and thereby must retract part of my previous statement as written. In the specific case of JSTOR from those examples, and thereby my entire first aside, there would be at least some marginal benefit, even under Wikipedia's rules.

However, I will point out that the underlying argument I was making quite stands: sending people to conferences, teaching them more about historiography, and getting them access to more sources of research, are all ways to make people better at performing what Wikipedia continues to call "original research" (and clearly wants no part in). Put differently, the goal of these activities is to make people better at interpreting primary information, not copy/pasting secondary information.

Indeed, Wikipedia makes it quite clear that primary sources may only be used in cases where people with no specialized knowledge can plainly understand the information (which, honestly, to me seems unlikely to be plausible: journal articles are not written for the layperson), and even goes so far as to be quite explicit that secondary sources must be used to make the primary sources trustworthy.

<edit>

It does, however, at least seem like the medical parts of Wikipedia do not believe that a newspaper article is even a valid secondary source, so good on them. ;P I tried to find a similar document for the field of History, and they do seem to be working on a similar document (still marked "essay") and it also looks like it could be pretty good (as in, "no journalism").

http://en.wikipedia.org/wiki/Wikipedia:Identifying_reliable_...

(I imagine an interesting benefit with Wikipedia/History is that you don't run experiments, so of course much of the material you find is technically a secondary source; you thereby don't actually find the usual discussion of "primary" vs. "secondary" in this essay: "primary" gets defined as the actual source documents, as arguably history is all opinion ;P.)

Even reading the history-specific documentation, however, does not seem to indicate that you could both follow the rules and get much value out of having a strong background in history and history research: I maintain that attending conferences and taking courses in the field is seemingly not something that Wikipedia would look favorably on.

This quote, however, was quite uplifting: "Historical articles on wikipedia should be the result of scholarly works.".

</edit>

(additional quotes from Wikipedia backing up my statements:)

"...similarly, a scientific paper documenting a new experiment is a primary source on the outcome of that experiment."

"A primary source may only be used on Wikipedia to make straightforward, descriptive statements of facts that can be verified by any educated person with access to the source but without further, specialized knowledge."

"For example, a review article that analyzes research papers in a field is a secondary source for the research"

-- http://en.wikipedia.org/wiki/Wikipedia:No_original_research

"A primary source in medicine is one in which the authors directly participated in the research or documented their personal experiences. ... Many, but not all, papers published in medical journals are primary sources for facts about the research and discoveries made."

"A secondary source is... Examples include literature reviews or systematic reviews found in medical journals, specialist academic or professional books, and medical guidelines or position statements published by major health organizations."

"All Wikipedia articles should be based on reliable, published secondary sources. Reliable primary sources may occasionally be used with care as an adjunct to the secondary literature, but there remains potential for misuse." (emphasis copied from original)

-- http://en.wikipedia.org/wiki/WP:MEDRS

ErrantX · on Oct 29, 2012

I chose the Medicine project because it is an example where they have identified an issue (bad sourcing can have real world consequence) and built on the Wikipedia sourcing policies to fit with how medicine operates academically.

I imagine an interesting benefit with Wikipedia/History is that you don't run experiments, so of course much of the material you find is technically a secondary source; you thereby don't actually find the usual discussion of "primary" vs. "secondary" in this essay: "primary" gets defined as the actual source documents, as arguably history is all opinion ;P.

Absolutely this! And outside of science this is how it almost always works. A journal article will 99% of the time be a secondary source.

I maintain that attending conferences and taking courses in the field is seemingly not something that Wikipedia would look favorably on.

It depends what the intended outcome is. Wikipedia deliberately pushes the idea that editors should not argue from the point of an expert - as such qualifications are impossible to prove, and other editors as badly placed to peer review your argument/work.

On the other hand training in your chosen topic is useful! I've attended a number of sessions organised by Wikimedia chapters which put Wikipedians together with, say, historians so they can collaborate and learn from each other.

sending people to conferences, teaching them more about historiography, and getting them access to more sources of research, are all ways to make people better at performing what Wikipedia continues to call "original research" (and clearly wants no part in).

I'm not sure I agree at all here. Anyone wishing to write about a topic should have a good understanding of it, an understanding of research and citation and so forth. So those skills would be encouraged! (as an aside; it is this skill deficit that puts many Wikipedians off and ties back into the decline the article mentions)

Original research is another matter, unrelated to tracking down source material and understanding context.

Put differently, the goal of these activities is to make people better at interpreting primary information, not copy/pasting secondary information.

Again I disagree. Wikipedia doesn't want editors interpreting sources, but it also doesn't want blind copy/paste. Writing tertiary articles requires us to maintain an objective view of the material and to have good editorial judgement in choosing how to summarise the topic. The point of "no original research" is to stop Wikipedians inserting their own views or conclusions - so long as a view we record can be verified in a source and is balanced within the context of the article/topic then that meets this rule.

To put it another way. Writing a tertiary article often requires extensive skill in researching the secondary material, summarising the academic consensus on the topic and making sure that summary is objective and unbiased by your own views. The key word in "no original research" is original :)

The problem of primary versus secondary sourcing is one that is not even well understood by a lot of editors - so it is no wonder it gets questioned. But in practice the policy is trying to avoid the situation where Wikipedians themselves research primary sources to draw their own (often non-expert) conclusions. Therefore avoiding material subjective to an individual editor.

A core problem that does exist is that outside of science and history there is often a strong reliance of media reporting. And scholarly literature either doesn't exist for that topic, or is rudimentary (I mean, who is going to write a scholarly article about Justin Beiber). Policies that allow modern-style sourcing for such articles but also strongly favour scholarly material when it is available are hard to write :)

saurik · on Oct 29, 2012

> ... primary versus secondary sourcing ... in practice the policy is trying to avoid the situation where Wikipedians themselves research primary sources to draw their own (often non-expert) conclusions.

FWIW, I would not (and do not) have a problem with this in isolation: I entirely accept that Wikipedia is itself not a journal, and it is going to be really weird if Wikipedia is a place where information is "first published".

Though, I do have a problem with the kinds of secondary sources (mostly: "newspaper articles") that then end up getting used to demonstrate a lot of things for which a primary source would be much more accurate and trustworthy.

> A core problem that does exist is that outside of science and history there is often a strong reliance of media reporting. ... Policies that allow modern-style sourcing for such articles but also strongly favour scholarly material when it is available are hard to write :)

If that is the intent then Wikipedia could at least say it somewhere (as opposed to having a handfull of articles about how to handle sourcing for specific subdisciplines): as it stands, they seem to have one overall policy, with some exceptions.

However, there are numerous cases in these non-scholarly-yet-"noteworthy" (to remove "notable" for a moment) where a primary source is simply more accurate than a secondary one, and yet people prefer to cite the newspaper articles.

Newspaper articles are written by busy people who have next to no domain-specific knowledge; they often don't even have first-hand vantage points: they are retelling stories heard by people told to them by other people and then "spiced up" for consumption.

I thereby contend that the problem, to me, seems deeper than just allowing "modern-style sourcing": the idea of using secondary sources to establish fact is simply incorrect. Even in scholarly fields, you usually don't cite review articles.

(I moved this comment lower, as it is the mainline argument in this thread, so I wanted to conclude on it.)

> It depends what the intended outcome is. Wikipedia deliberately pushes the idea that editors should not argue from the point of an expert - as such qualifications are impossible to prove, and other editors as badly placed to peer review your argument/work.

> On the other hand training in your chosen topic is useful! I've attended a number of sessions organised by Wikimedia chapters which put Wikipedians together with, say, historians so they can collaborate and learn from each other.

I am not certain how this fits with the policy. It is fairly obvious that writing in an article "I am a totally epic person in this field, seriously; thereby, trust me on this: X is true" would be incorrect.

However, Wikipedia makes it very clear in numerous places that I have referenced that you are not supposed to source something back to something that would require specialist knowledge to understand or verify.

If I wanted to, thereby, source an article on a subject in Mathematics back to the original article that proved it, I would be unable to, as only a mathematician is likely to understand the notation and terminology of the article.

This is the case even if the information that is in the article did not require any "interpretation" and is not "drawing conclusions": the rules are quite clear that facts must be able to "be verified by any educated person with access to the source but without further, specialized knowledge" [1].

Instead, I'd have to cite a secondary source. However, summary/review articles (the scholarly "secondary source") are actually kind of rare... you don't see them published often, because there is really no need to do so until there is a large amount of work to condense.

As a specific example of this "getting weird", the article on "Coppersmith%27s_Attack" (a way to break RSA encryption in some limited settings) is citing a review article "Twenty Years of attacks on the RSA Cryptosystem" (specifically rather than the actual paper).

What if that review article didn't exist yet? The attack in question came out two years before that summary article did, and AFAIK there haven't been any new attempts at an updated review in the subsequent 13 years (partly as there simply hasn't been "enough" new stuff).

The result is that there are a few interesting new attacks that aren't covered in the main "RSA_(algorithm)" article, all of which came out after 1999 (the date of the aforementioned summary article), such as even one just this year. It would be great to include them.

Yet, there is seriously an extended discussion about a recent "attack" on RSA that wasn't even an attack on RSA at all. This does reference the primary source, but only seemingly because it also was able to reference a New York Times article (?!) and... a blog post. Yay! :(

[1]: http://en.wikipedia.org/wiki/Wikipedia:No_original_research

(everything after this point is after this point because you added it in an edit, and I had already written the rest of my response; it thematically doesn't belong here, but I don't "have it in me" to rewrite the above to fit these comments in)

> Again I disagree. Wikipedia doesn't want editors interpreting sources, but it also doesn't want blind copy/paste.

The reason I use the term "copy/paste" is because you are explicitly disallowed from using any specialized knowledge to understand the citation. This is very different from interpreting (at least, the way Wikipedia defines this) the citation. There is no original research happening when I read a paper on number theory, but it is due to the specialized knowledge I gained from attending college as a math major that lets me understand what it says.

> Original research is another matter, unrelated to tracking down source material and understanding context.

This is not true according to Wikipedia's written guidelines, and this is certainly not how I've seen it work out "in the field": articles are often purposely and almost twisted-ly written in ways that rely on "for the layman" summary articles when primary sources are trivially available (as they were cited by the summary article) or which rely on articles in Wired (yes, Wired) even when all the reporter for Wired did is visit the web page of the actual primary source and then write an obviously-confused "scoop" of the topic at hand.

ErrantX · on Oct 29, 2012

I've enjoyed this conversation: but HN is not really set up for lengthy discussions :) (I lose track) so rather than clog it up if you want to chat more feel free to email me!

saurik · on Oct 30, 2012

Thank you, but sadly I am currently not setup to pretty much use email at all. ;P I thereby have enjoyed being able to have a long conversation on HN (which I do use quite often), but totally understand it must be infuriating for others. :(

Regardless, thank you very very much for taking the time to respond to my comments and to take them seriously enough to read in the first place.

(As opposed to another in-depth response, I thereby have simply clarified my previous argument regarding notability-from-verifiability in the other thread, butcher wise left all of the other points alone: maybe some day we can continue the conversation, possibly via email. ;P)

nevinera · on Oct 28, 2012

That first chart really bugs me. You're showing us an apparent asymptote, a totally reasonable thing to do to prove your point.. but then I notice it's a log plot. That isn't showing us anything, number of seconds would make the same shape of plot.

rhplus · on Oct 28, 2012

Yes, the log version is a little misleading, but the linear version shows that growth is slowing down too:

http://en.wikipedia.org/wiki/File:EnwikipediaGom.PNG

benmanns · on Oct 29, 2012

Note the light green line under the blue growth rate line. The Size of Wikipedia article[0] says that this was the modeled growth rate as of 2010, which it only held to until 2011. This shows the danger of modeling existing data and extrapolating to the future.

[0] http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

Jabbles · on Oct 29, 2012

It's still measuring number of articles, not, say, total word count. Do you have a graph of that too?

Jabbles · on Oct 28, 2012

I suspect many articles are approaching a local maximum, and it will be very difficult to do a significant reorganisation of an article in order to reach "the global maximum".

It is possible to have two very dissimilar, balanced and detailed articles on the same subject. Wikipedia makes an arbitrary choice and this is difficult to undo. I can't think of any articles that I can point to and say "this was the wrong choice", but it's certainly something to think about.

mattparlane · on Oct 28, 2012

I can't help but think that presenting this graph on a log scale isn't the best way to do it. I mean, if the number went up linearly year-over-year, wouldn't it look like that on a log scale anyway?

rm999 · on Oct 29, 2012

Yeah the log scale hides his point. A log scale is useful when you want to highlight the rate of growth, e.g. for something that grows exponentially. But I'd expect Wikipedia to grow linearly, not exponentially - on a log scale it is hard to distinguish anything that is subexponential (which includes linear growth).

wavesounds · on Oct 29, 2012

Totally agree with you guys and came here to post the same thing, he may make some good points but this seems somewhat deceptive to the average American reader.

chris_wot · on Oct 29, 2012

I think the best metric I'd use to see if Wikipedia is "complete" us to look at the major articles and see what proportion have [citation needed] on them.

Of course, I'm a little biased as that tag is the one I created so its really my baby. But I designed it for obsolescence, so I suppose I can't complain :-)

Xcelerate · on Oct 29, 2012

Whoa, hold on... you were the guy that invented that tag?!

chris_wot · on Oct 29, 2012

Yup. You can find my greatest hits under ta bu shi da yu.

_feda_ · on Oct 29, 2012

It's always seemed weird to me that wikipedia has a problem with scaring away potential editors. I for one love editing wikipedia and never found it in anyway intimidating when I started editing in my teens, although I'm probably part of a sizable edge case of humanity that enjoys the relative complexity of the process of editing itself, markdown et al. I don't find the culture to be off-putting either, because the culture gels nicely with my personality traits and those of any person who would enjoy contributing to an encyclopedia.

Wikipedia has a learning curve sure, but like learning to use a shell instead of a gui, the pain is certainly ultimately worth the gain. The barriers to entry are nowhere near as high as those I imagine serious, academic editors face in their careers.

mjn · on Oct 29, 2012

As an academic I generally agree. I have mostly good experiences editing Wikipedia, with a handful of bad ones. By and large it's much more reasonable than some of the things I've run into in academia. My main wish on Wikipedia is that I got more feedback of any kind. Often I will write an article and never hear from anyone about it. Maybe I pick too obscure topics, but it'd be nice to hear some constructive feedback more often.

_feda_ · on Oct 30, 2012

It's regrettable you're not getting the feedback you should be getting, as this is something that will really help you stick to it. On the other hand, the very reason you aren't getting feedback is a good one: you're working on obscure articles, which is precisely the area wikipedia needs work done on in order to become the fully-fledged, echt encyclopedia (as I'm sure you know in the case of proprietary equivalents professionals are hired to fulfill the role you are fulfilling). Anyway, consider yourself a lonely pioneer ;)

tokenadult · on Oct 28, 2012

I have been a Wikipedian with a registered account for a few years, and I have reflected on these issues. The article asks, "But what if the decline in engagement has little to do with culture or the design of the site? What if, instead, it's that there's just less for new Wikipedians to do?"

It's the culture. There is plenty to do. Indeed, every time I read a Wikipedia article, I find something in every article that turns on my inner editor mode. But my log-in information is only stored on my office computer, not on my wife's computer or my children's computer, so when I am reading along in Wikipedia with my family and I see a mistake, I usually grumble rather than Wikignome and fix the problem.

http://en.wikipedia.org/wiki/Wikipedia:SOFIXIT

I do Wikignome regularly when I'm reading Wikipedia for one reason or another on my office computer. That gives me solely the satisfaction of fixing a problem, and no satisfaction at all of supposing that that builds up my reputation among other Wikipedians or that the fixes will even persist in further updates to the pages.

Quite apart from the articles that have jump-right-out-at-you errors, convenient to fix for anyone who knows grammar, spelling, or basic facts of the world, there are many, many articles that are readily apparent examples of factual mistakes from more subtle causes such as edit wars

http://en.wikipedia.org/wiki/Wikipedia:Lamest_edit_wars

that often persist even after action of the Arbitration Committee.

http://en.wikipedia.org/wiki/Wikipedia:Arbitration/Requests/...

A user who is knowledgeable about SOURCES and who has professional editing experience and academic editing experience (as I have) is hardly likely to find Wikipedia welcoming. Wikipedia's culture was set in the days when it was the "encyclopedia anyone can edit," that is, the encyclopedia where anyone can make something up, and then as Wikipedia grew an administrative apparatus, administering was used more for turf battles and point-of-view pushing than it was used for editing to encyclopedic quality or fact-checking.

I think a philanthropist with a budget the size of Wikipedia's budget over a few years (a large amount of money, but pocket change for a billionaire) could build a Wikipedia competitor that could do quite a lot better. Maybe no one will ever think that that is worthwhile. For sure, many people who are well aware of how incomplete Wikipedia still is regret the off-putting culture of Wikipedia, and will be dissuaded from spending much volunteer time to improve it.

AFTER EDIT: A grandchild comment noted that "One perk of competing with Wikipedia is that you can use Wikipedia articles as a base." And indeed, if a philanthropist's goal for a competing project were improving the quality of online encyclopedias, licensing the competing project's articles in the same way Wikipedia's are licensed would mean that good articles could work their way back into Wikipedia--a very disruptive strategy indeed for improving content.

atombender · on Oct 29, 2012

I think a contributing factor is that Wikipedia is not particularly easy to edit, especially for non-techies.

As a casual editor -- and as a seasoned techie, even -- the markup language used by Wikipedia is daunting. For example, the documentation on available templates (which, incidentally, are completely different for each sub-language wiki; in fact, most non-English wikis lack a full set of templates found in the 'en' wiki) is so bad that I usually have to spend several minutes hunting down information about the "citation needed" template, because I keep forgetting it, or the ways to write a <ref> tag or even links. Every time I want to create a brand new wiki page, I have to copy another page as a starting point because of all the arcane templates, tags and conventions I need to use.

The default editor mode is a mess, with a tiny proportional font used for the edit text field, no autocompletion or syntax highlighting, no "draft mode", no diff-based conflict resolution, etc. I suspect that many experienced editors probably use apps or a collection of GreaseMonkey scripts to fix the editing experience, but to casual users it's pretty unbearable.

Keeping track of edited articles is also a mess, to the point of being useless. There's an Atom feed, but the item contents is just the edit description, no diff. The article history page itself is like something out of Bugzilla, designed by some Perl guy with no notion of UX, with radio buttons as a way to navigate among diffs.

These little annoyances add up to the point where I hesitate before considering whether I should bother to perform an edit and create a page, dreading the technical hurdles.

mjn · on Oct 29, 2012

This only addresses a minor part of your (legitimate) complaints, but fwiw, it's okay to add references without using the appropriate templates. You can just write the reference, in plain text in whatever format you want, between the <ref></ref> tags. Someone else can fix up the formatting later if they want. The only really important thing is to mention where the information came from in some kind of readable way, with something like: <ref>A. Author. Article title. In Proceedings of Conference 2012, pp. 11-12</ref>

atombender · on Oct 29, 2012

I didn't know that. Thanks.

waqf · on Oct 29, 2012

It is possible that you could generate a better 'pedia than Wikipedia simply by datamining the version history of a Wikipedia article and trying to resurrect, or piece together, a better version of the article from the previous versions.

It's not quite as stupid an idea as it sounds: many Wikipedia article histories are a series of small improvements punctuated by massive destructions (large deletions or misguided rewrites), and if the damage isn't caught immediately it doesn't get fixed at all.

jarek · on Oct 29, 2012

The only challenge is distinguishing large deletions of worthwhile content from large deletions of crap. Both programmatically and philosophically.

unavoidable · on Oct 28, 2012

I'm not sure if an alternative competitor is necessarily possible or easy. Google tried an alternative model presumably to address issues with turf wars and credibility with Google Knol, but that burned out pretty quickly even though the model was worth pursuing (giving expert editors incentives to create their own pages in an area they have specialized domain knowledge).

tdoggette · on Oct 28, 2012

One perk of competing with Wikipedia is that you can use Wikipedia articles as a base.

mjn · on Oct 29, 2012

A perk but also a logistical challenge. In the small number of attempts I've observed so far, a big problem is that if you start with a small community but all of Wikipedia as your starting content, you immediately get overrun by spammers, because your small community can't police editing of 2m+ articles, and you also lack all the anti-vandalism "immune system" bot infrastructure that Wikipedians currently run.

One way to sidestep it would be if your experiment is with a community structure that's much more restrictive than Wikipedia's. If, for example, only approved editors can edit articles, then you avoid most of the spam problem. But it's not clear to me that insufficient barriers to editing are the main problem with Wikipedia.

saraid216 · on Oct 29, 2012

It seems to me that Wikia should be an effective disruptor, but it isn't and I'm not sure why. Culture? Stigma? Awareness?

Reasons I think it is: it's very domain-specific, which makes it possible for qualified editors to flock to appropriate domains; there is a separate command structure per Wikia, enabling access on a per-qualification basis; it practically begs you to copy content from Wikipedia as a base; it already has a financial support system in place with the ads, and presumably has solid infrastructure backing it.

I can see how some of those strengths would double as weaknesses, though.

pm90 · on Oct 29, 2012

There is another project called Scholarpedia, but that is restricted to Science/Math only, I think

treelovinhippie · on Oct 29, 2012

Simple: remove the "must be noteworthy" rule. I remember many years ago I was heavily into the Diggnation podcast and a few others coming out of Revision3 and Leo Laporte's network. Anyway, somehow I did a search on Wikipedia and found that these podcasts were not on there at all (despite having many viewers).

So long story short, I created pages for these podcasts, filled them with content, and a few days later they were removed by mods for not being "noteworthy" enough. I attempted to plead with them that these podcasts were in fact the most popular tech podcasts on the web, but the mods didn't care.

I haven't edited Wikipedia since.

britta · on Oct 29, 2012

The notability guideline (http://en.wikipedia.org/wiki/Wikipedia:Notability) is closely related to the verifiability policy (http://en.wikipedia.org/wiki/Wikipedia:Verifiability) — from Wikipedia's perspective, if a subject has not been covered by multiple reasonably reliable secondary sources (in other words, if it isn't "notable"), we can't write a reasonably verifiable article about that subject. Every article has to include secondary sources as references, so that editors and readers can quickly fact-check.

There are plenty of thoughtful discussions elsewhere about why these notability and verifiability rules are flawed, but I wanted to point out this connection because it shows that the notability guideline is possibly not as arbitrary as it seems at first. I wish I could figure out a way to help Wikipedia explain this better. Maybe "notability" is the wrong name for this guideline.

mjn · on Oct 29, 2012

I agree with you as it currently stands, though I think there's been a shift on that. There used to be an attempt to enforce an independent notability guidelines, that Wikipedia would only cover subjects that were "encyclopedic". I believe that even predates the verifiability/referencing push. But where I agree is that I believe the latter has mostly obsoleted the former: nowadays, if you have good references for your article, it's ipso facto notable.

I wrote a bit about that evolution elsewhere, on the off chance it's interesting: http://www.kmjn.org/notes/wikipedia_notability_verifiability...

saurik · on Oct 29, 2012

> ...if a subject has not been covered by multiple reasonably reliable secondary sources (in other words, if it isn't "notable"), we can't write a reasonably verifiable article about that subject...

It is nearly impossible to verify an article based on secondary sources. To take a non-Wikipedia view of this for a second, I went and pulled the descriptions of these sources from academic institutions. Princeton's reference desk describes a secondary source as follows. After that, a similar description from the UCSC library.

> A secondary source interprets and analyzes primary sources. These sources are one or more steps removed from the event. Secondary sources may have pictures, quotes or graphics of primary sources in them.

> The function of these is to interpret primary sources, and so can be described as at least one step removed from the event or phenomenon under review. Secondary source materials, then, interpret, assign value to, conjecture upon, and draw conclusions about the events reported in primary sources.

(At this point I also feel it important to point out that both of these sources believe that encyclopedias are "secondary sources". If you do a search on Google for "tertiary source", which Wikipedia adamantly believes is a third category that includes encyclopedias, you get only 43k hits, half of which mention "wikipedia". The few universities that mention "tertiary source" list encyclopedias as being in both categories: only Wikipedia seems to believe that encyclopedias are clearly and definitively a "tertiary source".)

If you are interested in opinions and analysis, you can happily refer to a secondary source, but if you want to verify what actually happened, you cannot be taken seriously unless you can show a clear trail of evidence that terminates in a primary source.

(Certainly, if it is impossible to obtain primary sources, then one can use a secondary source, but that is something that should only be used as a last resort: if you have access to the primary sources that a secondary source used, you should verify them yourself.)

The result of attempting to find truth from secondary sources is that you will forever be plagued by horrible bias, both in terms of what things you can find to be "notable" and in terms of the validity of the opinions you can find in them.

Worse, as encyclopedias--which today pretty much means "Wikipedia" to a very large percentage of the public--are used as reference material for the construction of secondary sources, everything from newspaper articles to books, claiming secondary sources to have anything to do with "validation" just makes your chain of evidence circular.

And, in fact, there have been multiple published large-scale examples of such circular information ending up in Wikipedia, as temporarily un-sourced information gets used in newspaper articles which then reinforces the information in Wikipedia when editors attempt to find sources.

Regardless, the entire notion is kind of preposterous anyway, as the way that most secondary sources operate they just print what they are told by the people they interview for articles without the citations required to verify the ultimate source of the information.

This means that if someone, whether it be the Roth example from a few days ago or anyone else wanting to provide a paper-trail for information on Wikipedia, wants to be able to get something into Wikipedia, they really just need to be in the position to tell a reporter the information: they do not need to post it themselves directly.

As an interesting contextual example of this, Wikipedia now actually does have an article on Diggnation. In this article, it states quite firmly that "there are an estimated 250,000 regular subscribers to the show" along with a sourced citation to... the New York Times.

However, the article in the New York Times quite clearly was entirely itself sourced by simply asking the people who worked for Diggnation and Revision3 a bunch of questions; it even quite clearly states that that number came from them: "Revision3 says it counts roughly 250,000 views each week".

In such a situation, the way the New York Times operates (and I know this first hand, as there have been articles about the things that I do published by them) is that they will happily publish whatever number you tell them, and at best "fact check" it by calling you back to verify they heard you correctly.

This does not make the number true; it barely even demonstrates "Diggnation was important enough to have an article written about them", due to the "slows news day" phenomenon. The idea that this source is somehow different from any other source is just a fantasy, and one that, as far as I've been able to tell, one that only Wikipedia believes.

Wikipedia, in fact, seems to take the exact opposite stance on all of this, claiming that "Wikipedia articles should be based on reliable, published secondary sources and, to a lesser extent, on tertiary sources", along with a long list of rules about how primary sources (which are apparently even more dangerous in their world-view than tertiary sources) can technically be used, but only in highly limited and nearly useless circumstances.

http://en.wikipedia.org/wiki/Wikipedia:No_original_research

The result of this insanity is then numerous situations like the one that treelovinhippie ran into. Re-telling him what the rules of Wikipedia are--when it is quite clear that he got to experience them first-hand and was sufficiently unimpressed as to ask for their removal--seems to be missing the point: for this purpose, the rule really is "arbitrary", and couldn't possibly have anything to do with "verifiability"; if anything, Wikipedia's rules on "notability" at the same time cause numerous topics to be unable to be covered and cause the information that does get published to be based on shaky foundations and unreliable sources.

ErrantX · on Oct 29, 2012

Worse, as encyclopedias--which today pretty much means "Wikipedia" to a very large percentage of the public--are used as reference material for the construction of secondary sources, everything from newspaper articles to books, claiming secondary sources to have anything to do with "validation" just makes your chain of evidence circular.

This is a phenomen Wikipedians are aware of. But what you refer to here are often tertiary sources, and not necessarily the sort of material you'd want to use.

The result of attempting to find truth from secondary sources is that you will forever be plagued by horrible bias, both in terms of what things you can find to be "notable" and in terms of the validity of the opinions you can find in them.

You're arguing to swap that bias for your own? i.e. you look at primary source material and decide which items are notable for their own article. What about if another editor disagrees?

In academia secondary sources are an established way of processing primary information; an expert reviews the primary material and, citing it, draws conclusions. Other experts may do the same, and disagree with the first one.

Wikipedia is a tertiary source, which draws on secondary and, yes, primary material to summarise the current status of a topic.

However, the article in the New York Times quite clearly was entirely itself sourced by simply asking the people who worked for Diggnation and Revision3 a bunch of questions; it even quite clearly states that that number came from them: "Revision3 says it counts roughly 250,000 views each week".

In such a situation, the way the New York Times operates (and I know this first hand, as there have been articles about the things that I do published by them) is that they will happily publish whatever number you tell them, and at best "fact check" it by calling you back to verify they heard you correctly.

This is a problem, and one with no easy solution. The approach of requiring sources to have their own review process (i.e. media has editors who theoretically vet content) should fix this, but as you point out, in practice it is not 100% reliable.

saurik · on Oct 29, 2012

> You're arguing to swap that bias for your own? i.e. you look at primary source material and decide which items are notable for their own article. What about if another editor disagrees?

treelovinhippie's stated opinion was "remove the "must be noteworthy" rule" (from which I am mentally rewriting his statement to "notability"). I personally agree with that, as I do not believe that the notability requirement is leading to increased article trustworthiness.

I would rather see the idea of "notability" replaced with an "article score" based upon the history: there have been a few glorious visualizers for Wikipedia designed to really make this information hit center, and I think any of them would be a better global solution than "notability".

> In academia secondary sources are an established way of processing primary information; an expert reviews the primary material and, citing it, draws conclusions. Other experts may do the same, and disagree with the first one.

I imagine this is determined heavily by the field? I can't imagine in either mathematics or algorithmic computer science there being any need to wait until a review paper is published "drawing conclusions" (whatever that would mean) from the information in the primary research.

However, I honestly feel like you are dragging me down an academic rathole in these very specific cases, such as medicine: the kind of material that we are explicitly discussing here, Diggnation, is not going to be covered in anything remotely approaching a scholarly source.

Instead, what we are comparing are situations like 1) referencing a New York Times article stating that "person Y said X" and 2) directly linking to the blog or Twitter feed of person Y and demonstrating first-hand that they said "X": in these situations, removing a layer of required trust.

In my experience discussing situations like this with Wikipedia editors, in addition to my direct attempts to read through the rules that Wikipedia publishes for how their website should be used, it is quite clear that Wikipedia would prefer you to cite the NYT article rather than hotlink the information.

> This is a problem, and one with no easy solution. The approach of requiring sources to have their own review process (i.e. media has editors who theoretically vet content) should fix this, but as you point out, in practice it is not 100% reliable.

So, Wikipedia specifically states that "mainstream newspapers" are an example of "the most reliable sources". Can you provide me a reference to where Wikipedia is claiming that you can only depend on sources that "have their own review process"? (Or, was that just a potential idea for a fix?)

Personally, in this situation, I would much prefer a reality where Wikipedia simply admits that secondary sources (in this field; again: not medicine, and possibly not even history) are nothing more than opinions (in the general case, this is how Wikipedia defines secondary sources: opinions and interpretation).

If they did this, then they couldn't cite a New York Times article for a sentence that simply stated a fact: they could only cite a New York Times article and state "the New York Times states", for which the New York Times should be a perfectly valid (and I will argue "authoritative") reference.

Yes: in this case that would probably make it impossible to just state "Diggnation has 250,000 subscribers" without some kind of in-article qualification; however, I don't think that that is inherently a bad thing: I feel like if I were writing a scholarly article on this subject, I wouldn't be able to say anything definitive either.

ErrantX · on Oct 29, 2012

I personally agree with that, as I do not believe that the notability requirement is leading to increased article trustworthiness.

I struggle to follow this because Wikipedia doesn't consider "notability" to relate to article trustworthiness. It's instead intended to act as a soft margin for the topics that deserve a standalone article (remember, noteworthiness is only related to whether an article should exist or not).

I would rather see the idea of "notability" replaced with an "article score" based upon the history

Interesting idea, and I'd like to hear more of this idea! However, from this I suspect you are mixing up the reliability of an article and the specific phrase "notability" (about how deserving a topic is to have an article). When deciding if an article is notable it will usually have little or no history!

I imagine this is determined heavily by the field?

Yes, a point Wikipedia admits.

I can't imagine in either mathematics or algorithmic computer science there being any need to wait until a review paper is published "drawing conclusions" (whatever that would mean) from the information in the primary research.

Then you would, I am afraid, be wrong :) In such a field many papers are published contending a theory, and then other experts review the contention and submit responses/reviews/criticism.

the kind of material that we are explicitly discussing here, Diggnation, is not going to be covered in anything remotely approaching a scholarly source.

Yes, which is the key to the problem.

Can you provide me a reference to where Wikipedia is claiming that you can only depend on sources that "have their own review process"?

Yes, the core sourcing policy requires a "reliable source" (http://en.wikipedia.org/wiki/Wikipedia:RS). "Articles should be based on reliable, third-party, published sources with a reputation for fact-checking and accuracy". http://en.wikipedia.org/wiki/Wikipedia:SOURCES#Reliable_sour... goes into more detail.

Instead, what we are comparing are situations like 1) referencing a New York Times article stating that "person Y said X" and 2) directly linking to the blog or Twitter feed of person Y and demonstrating first-hand that they said "X": in these situations, removing a layer of required trust.

Is it? Now I agree that if the NYT has simply asked Diggnation for those figures then Wikipedia should note that is where it comes from. However citing it to NYT is intended to demonstrate that someone other than the editor inserting the material trusts its veracity. Its difficult because we simply do not know where NYT got that info: if Diggnation showed them some real figures (say, a screenshot) then the NYT article is better than a tweet!

Which is why this is not a simple problem.

saurik · on Oct 29, 2012

> Interesting idea, and I'd like to hear more of this idea!

I managed to dig up one of the examples: IBM's History Flow. I believe it was in a course in Cognitive Science that I was taking back in 2004 where I first heard of this research.

http://alumni.media.mit.edu/~fviegas/papers/history_flow.pdf

I feel like there was something similar that was much lighter-weight (as in, not so "large") that I had seen presented at one point, but I had a hard time finding even this one. :(

saurik · on Oct 29, 2012

> I personally agree with that, as I do not believe that the notability requirement is leading to increased article trustworthiness.

> Interesting idea, and I'd like to hear more of this idea! However, from this I suspect you are mixing up the reliability of an article and the specific phrase "notability" (about how deserving a topic is to have an article). When deciding if an article is notable it will usually have little or no history!

The reason I am "mixing this up" is that the only argument I have ever heard for why Wikipedia needs a notability policy at all is the one that was cited by britta earlier in this thread: that without such a clause there would be tons of articles that are hardly ever looked at by anyone, hardly ever edited by anyone, and containing information that is difficult to verify. I believe that there are numerous better solutions to this than attempting to use the "notability" filter, as I maintain that "notability" does not actually lead to "veracity".

To be very clear about this, I will repeat the context from britta that started my involvement in this discussion:

>> The notability guideline (http://en.wikipedia.org/wiki/Wikipedia:Notability) is closely related to the verifiability policy (http://en.wikipedia.org/wiki/Wikipedia:Verifiability) — from Wikipedia's perspective, if a subject has not been covered by multiple reasonably reliable secondary sources (in other words, if it isn't "notable"), we can't write a reasonably verifiable article about that subject. Every article has to include secondary sources as references, so that editors and readers can quickly fact-check.

Going back to your response:

> Its difficult because we simply do not know where NYT got that info: if Diggnation showed them some real figures (say, a screenshot) then the NYT article is better than a tweet!

I must apologize here, as I was intending to be using a different example, but you took me to mean the Diggnation example: this is my fault, I should have been more clear.

What was coming to mind with relation to the "Twitter post" example, is that there is a ton of journalism on topics I directly care about that is based on the Twitter feeds of people I work with: a lot of "person X said Y", which is then translated into some article on what is or is not possible with a tool that person X builds. The fact that a reporter read that statement and repeated it doesn't make it more true, and in fact the opposite is quite common: they are paying sufficiently little attention and have sufficiently little background knowledge that they repeat it wrong.

Please understand: I am not talking about a situation of "interpretation" or "research", but more like establishing dates on when things happened... I understand that this feels pretty blurry (but it seems equally silly to go into a detailed example of something where my example is itself biased; I am happier sticking with the examples such as Diggnation and RSA); it is simply a situation where "that dude at Wired that finds this stuff gets him readers" is not somehow more trustworthy than the place he got the information from... either the information shouldn't be published at all (I believe this is probably a quite reasonable course of action), or "that dude at Wired" should be skipped and the original source should be used.

That said, part of your comment doesn't rely on that misunderstanding I caused: it is true that we don't know where the NYT got that information, however that uncertainty really doesn't make it any more true; while it means there is some possibility that the information was gathered in a way that we should indirectly trust, as we can't see it we can't verify it, and it is honestly not in any way better than if some random person on Wikipedia just asserted it to us... reporters for major publications (such as the New York Times and Washington Post, both of which I have first-party experience with due to articles written about my work) really do trust that you are a reliable source on things that you control, and really do attempt to fact check by calling you back for verification.

ErrantX · on Oct 29, 2012

Ok, fair enough. Britta touches on some of the issues, but not in its entirety. Notability is about a) requiring there be at least one reliable third party source (so that the article has a chance of containing verifiable information) and b) ensuring that there is some limit on the scope of Wikipedia. It is this latter one that is the key facet.

Whilst Notability is closely related to Verifiability it is not quite in the way britta cast it, but rather related to requiring the material used to define a subject notable (i.e. a significant claim to importance) is verifiable. i.e. the relationship works the opposite way.

I was intending to be using a different example, but you took me to mean the Diggnation example: this is my fault, I should have been more clear

Ah, my apologies, I'm reading quickly as it is a busy day.

Please don't get me wrong; the issue you highlight is a major problem, one I have raised a few times internally with the community. But there has been no easy resolution.

It's worth noting that the reliable sources policy explicitly notes that reliability hinges on not only the publisher but also the content and the author. If an author is seen to lack the qualifications, or has a bad reputation, these factor into consideration.

With that said; a lot of Wikipedians don't know this. A problem I run into constantly when discussing sources ("Well, it was published by the NYT's so it doesn't matter what their reputation is"). It's not the policy at fault there, but the lack of interest of our own community in the rules...

"that dude at Wired that finds this stuff gets him readers" is not somehow more trustworthy than the place he got the information from...

The intent of the policies (and bear in mind what I say above as to how much that holds up..) is that the secondary source is used to filter what in the primary material is considered important to the wider community. To take an example: when Microsoft released Windows 8 there was quite an extensive list of new features. Simply recording that isn't what Wikipedia aims to do, instead you would use secondary sources to highlight the new features that were considered by "experts" to be important, groundbreaking or otherwise worth a comment (of course, the full feature list would be linked to as well).

I'm not arguing this policy is perfect, nor that it doesn't break down in the scenario you cite, but it does have a solid basis.

One other policy is that Wikipedia does not have firm rules (for this very reason) and so you could say that making a convincing argument such as you have should keep the material out. In principle this works, in practice it doesn't but only because of the community dynamics (a whole other problem!).

while it means there is some possibility that the information was gathered in a way that we should indirectly trust, as we can't see it we can't verify it, and it is honestly not in any way better than if some random person on Wikipedia just asserted it to us...

To an extent it does. Because the reporter who you cite has his/her real name attached to the article and has a public reputation to uphold.

saurik · on Oct 30, 2012

> ...ensuring that there is some limit on the scope of Wikipedia. It is this latter one that is the key facet.

Right... but in a world where I can store the entirety of Wikipedia on my mobile phone, you have to trabsitively ask why there is a need a "limit on the scope of Wikipedia". I see no a priori reason why Wikipedia needs or even should tolerate such limits, so one must examine te arguments used to defend that policy.

So far, the only reasonable arguments I have heard (as in, discounting technology problems that never existed: you can easily scale Wikipedia to have a bunch of mostly-ignored articles) come down to "verifiability" through the argument path I elaborated (and which britta seeded), and that is precisely the path used by people defending "deletionism" on behalf of Wikipedia editors.

ErrantX · on Oct 30, 2012

There are a number of good arguments.

Where does the scope of Wikipedia end? Should there be an article about "saurik"?

How do you actively police articles for e.g. defamation (note, we already struggle to handle this problem and it is getting worse)?

How do you stop spam?

I like to come at this argument from the opposite direction: what need is there for Wikipedia to give an article to every single trivial thing. Is what the president had for breakfast in 2011 sufficiently interesting to the reader?

Wikipedia is not a dump of knowledge, it is supposed to be a curated summary of the sum of human knowledge. And as with an article where you make editorial decisions about the level of detail to go into, so the entire Wiki is scoped to a reasonable level of detail.

mjn · on Oct 29, 2012

The newspaper problem I agree is a problem. I would generally not consider newspapers to be "secondary sources" (or "reliable sources"), though, but actually something closer to primary sources. Historians typically treat them as primary sources, one source of contemporary information that needs to be carefully interpreted in order to determine what happened. But for recent events they might be the best that's available.

For example, Wikipedia's articles on World War II should be mainly based on existing WW2 scholarship, not on Wikipedians individually doing research into newspaper or military archives (many of which are now online so this is easy to do), because assessing contemporary newspaper accounts of WW2 for reliability is a difficult job, and Wikipedia isn't the place to peer-review attempts to do so when there is already so much good historical work on the subject, and better-suited venues to vet proposed revisions to the historical consensus. So Wikipedia's job should be to summarize the current historical literature on the subject, not to attempt a novel historical synthesis from primary sources.

With scientific articles it's a bit trickier. Individual articles are citeable, but I typically prefer meta-analyses or review papers. Since the goal of Wikipedia is to summarize existing consensus in a field (while also documenting any significant divergent views), those are the strongest indicators. An individual article sometimes indicates the existing state of scientific knowledge, but has to be used carefully, because it might already have been refuted by another article, or it might be a proposal that, while considered good enough to pass peer-review and be worth publishing, is so far only accepted as conclusive by a small minority of the relevant scientific community. (Some journals publish commentary/responses along with controversial articles, which can be useful to help determine if that's the case.) I guess that's also true in history: just because Speculum published an article where someone proposes a large-scale revision to our understanding of 13th-century France doesn't mean that medievalists generally agree with the new proposal. So if it's a recent article and there isn't indication of it having been taken up into the new consensus, I'd cite it in support of a "but some historians argue..." statement rather than in support of an unqualified statement.

This rule also, pragmatically, avoids re-litigating whole areas, like fringe physics and holocaust denial. Maybe someone has great original arguments for their physics theory, or for why the holocaust never happened, but because of "NoR", Wikipedia isn't the place to make those arguments.

saurik · on Oct 29, 2012

I am not in any way arguing that people should do "original research"; can you point out where I have done so?

You also seem to be taking a non-Wikipedia view of the definitions of these various varieties of "source".

o Wikipedia makes it clear that articles should be based on secondary sources, using primary sources only sparingly and preferably only when backed up by secondary sources.

o Wikipedia makes it clear that research in a field, such as the results of an experiment or the proof of a theorem in a field such as mathematics, is a primary source.

o Wikipedia makes it clear that for their purposes a newspaper article is a secondary source (not a primary one), and in fact is an example of "the most reliable sources" available.

o Wikipedia also, at least, makes it clear that a summary article written in an academic journal is a secondary source. This is really not that bad, but I demonstrate in another thread how summary articles are a painful limitation.

Yes: in the field of history, these are defined differently, and even Wikipedia drops their notion of "secondary source" and instead comes up with something different. However, I am not talking about scrounging up ancient newspapers when I say "primary source": I mean either bypassing the error-laden filter of the New York Times or citing the actual experiments when discussing those experiments, rather than citing summaries or even journalism.

derleth · on Oct 29, 2012

It isn't about explaining it better, it's about the fact people have a need to feel special and some people peg that specialness to having an article about Their Thing on Wikipedia. Patiently explaining to them that, without external verification, there's no way to prevent them from simply making shit up is not going to reduce their sense of indignation when their unverifiable stuff is deleted.

saurik · on Oct 29, 2012

The way these rules are currently set up does not solve this problem: you simply "make shit up" when talking to reporters and then let people reference the secondary source (which is usually designed to fact check information by simply calling you and asking "was this true", at which point you can easily just lie; more information on this in my upstream reply).

You simply cannot use secondary sources to solve the problem you are describing, and if you tried to do so in an academic article you would find yourself unable to get published: you would simply find a note attached from peer review asking you to instead cite primary sources... as otherwise there is nothing to prevent you from "simply making shit up".

gbog · on Oct 29, 2012

In the first comment on the linked page, a guy says rightly that the number of missing articles is still enormous, but in parts of knowledge that are further from that of the median Wikipedian.

In a comment here someone says the foundation has the goal to find and purpose interesting things to write on Wikipedia to new editors.

The sum of these the interesting problem of crossing culture barriers. How to get the regular geek interested in the many kings and warlords of China's Warring States?

davidw · on Oct 29, 2012

Am I the only one who has memories of 'Raid on Bungeling Bay' triggered by the words 'nearing completion'?

http://en.wikipedia.org/wiki/Raid_on_Bungeling_Bay

"Battleship nearing completion!" or something to that effect.

pixelcort · on Oct 29, 2012

One big opportunity is the differences in content between different language editions of Wikipedia. In many cases the content for an article varies widely between languages, and I still run across articles that are not yet available in English yet.

anuraj · on Oct 29, 2012

Very myopic view restricted only to English language. There are more than a 100 major languages in the world. So now Wikipedia has to concentrate on other languages.

pebb · on Oct 28, 2012

Yup, from now on all edits done by outsiders shall be reverted with extreme prejudice.