It will be interesting to see how this impacts the Android/iOS battle. Search revenue funds almost all of Google's other activities so if people start using other search engines or find alternate ways to get their content it could impact the level they can spend on phones.
With a push to a mobile first world the Android model is especially sensitive to spam. On a full size browser you have a lot more context and results for a given search. 5 Results may be spam, but you can work around them. If the average phone screen shows 3-5 results and all of them are spam you will quickly find alternate tools.
Google ignoring spam is like Microsoft ignoring the cloud.
It won't. Whoever comes up with a better search engine is going to get a large check from Google, and a monster check if he can make MS and Google go into a bidding war over the business.
A better search engine is not what will do Google in, because they would understand the danger it posed to them. What will do Google in is a business which they don't understand that can kill search engines as a place to do business.
Ignoring the damage of incentivized false-positives in searching our highly-connected global compendium of human knowledge and activity is at least as stupid as major producers of low cost consumer information goods and services failing to capitalize on the on-demand delivery and storage capacities of said compendium. Avoid both, please.
This is like writing stuff for the Voyager's Golden Record.
They would parse the words, but not the correct meanings.
The impression they'd come away with would be something like: (Extremely large number) ignoring canned meat is like (random company name) ignoring the high-altitude water vapor.
The article calls out two specific companies as "landfill in the garbage websites that you find all over the web." Reasonable people can disagree over whether such content is truly spam or low-quality content, and thus how to respond.
What is the difference and how should they respond? It seems to be a rising frustration among power users that Google is increasingly becoming a wasteland populated by spam. For example, Marco Arment recently commented on his podcast how hard it was to find answers to simple questions on Google these days. He was saying that the content farms have basically created a page for every PHP function with thin content and rendered it useless. For a company whose goal is to index all human information it is a pretty big warning flag.
What is the appropriate user response? Go to Stack Overflow? Find a branded knowledge base like O'Reilly's Safari? I'm genuinely curious to know what we can do.
Disagree on the PHP function thing.
For nearly all function names the php.net page is the top result, even when there is a C function of the same name.
Occasionally there is a w3schools or similar close to the top, but it's not like those guys have just wholesale ripped the docs.
I was referring to how Google should respond to content farms. Historically, Google has been willing to take manual action on webspam. With the rest of search quality and ranking, Google tries to use algorithms as much as we can. So the distinction of whether something is spam vs. low-quality is an important one within Google.
One of the reasons the searchwiki approach was a good idea. Not everyone has the same opinion, what one person found helpful another found low quality content.
Neither spam nor low quality results should appear on the first page of results. I don't see how the distinction necessarily changes how you treat the useless sites.
If we were to pick a random Demand Media or Associated Content page, what are the odds a reasonable person would prefer its writing to a less-cynically-optimized, less-ad-drenched alternative page that the test page has bumped out of the top results?
Sounds more like a semantic excuse to justify the proliferation of adsense riddled spam content mills that are chasing keyword search volume and making Google a hefty sum each year rather than a legitimate conundrum.
What if this low quality content is able to give the answers the user was looking for when the searched? A lot of the ehow type stuff could be considered low quality to some and informative to others.
I am not opposed to the content mills per se, mainly because I think that they aren't doing anything "destructive" IE comment spamming, phishing, etc. They're using the tools and data that Google freely provides to produce laser targeted content that the algorithm eats up and ranks high because of their domain authority and onsite interlinking. It's essentially a battle between them and Google, and clearly Google is fine with them doing what they do or else they would have stopped them years ago.
What most (power) users are opposed to are scraper sites that reuse other sites' content to rank higher than the original content source, and tactics like eHow uses where they have 10 different articles about how to tie your shoe, but each one has a title that matches a different long-tail version of the search query.
Again though, the issue is that Google isn't reacting to and clearing out content spam. Most likely because the sites add to Google's bottom line and a spam engineer modifying the algo to remove the powerhouse content mill sites can actually negatively impact the Google's revenue.
Also, when we complain about search quality, we're a vocal micro-minority. Most people, like you said, find these sites useful, and haven't even though about the implications of these content mill sites.
"Most likely because the sites add to Google's bottom line and a spam engineer modifying the algo to remove the powerhouse content mill sites can actually negatively impact the Google's revenue."
"I was referring to how Google should respond to content farms. Historically, Google has been willing to take manual action on webspam. With the rest of search quality and ranking, Google tries to use algorithms as much as we can. So the distinction of whether something is spam vs. low-quality is an important one within Google." - Matt_Cutts
I know you are probably asked to respond about specific spam cases constantly, so don't take this as me demanding an answer for this specific instance. However, eHow is clearly leveraging their domain authority here to scrounge up the traffic for each different long tail variation of the term "How to Tie Your Shoelaces".
The reason they're targeting each of these phrases with a different page of content is because of the data Google gives them (and all of us) about who is searching for what and how many times per month, coupled with the fact that they have a mega powerful domain which, when a new page of content is added to it that uses an exact keyword in its title, that page will rank top 5 in Google almost every time.
Therefore, the data that they're using to come up with these keywords to feed their gaggle of writers is the related keyphrases data provided by your keyword tool. Algorithmically, this should be easily detectable, as you guys have the list of related keyword data that they're using in the first place.
Why then, are they in the top 5 for each of these keywords? Are 3+ different guides on how to tie shoe laces really necessary? Shouldn't 1 page be ranking for all 3+ of these tight variations? Shouldn't dozens of related pages of content targeting minute keyword variations be something relatively easy to detect?
Seeing multiple Adsense units on these obviously SEO-fueled pages I've linked to leads me to believe there's at least a little bit of truth to what you quoted me saying.
"Seeing multiple Adsense units on these obviously SEO-fueled pages I've linked to leads me to believe there's at least a little bit of truth to what you quoted me saying."
Speaking as someone who has worked at Google for ~11 years at Google and worked on spam at Google for ~10 years, I can tell you that running AdSense doesn't get you any kind of special consideration in Google's rankings. You don't have to believe me, but it's true. :)
By the way, I talked a bit about content farms and Google's take on them in November at a search conference. Here's a link that blogged about it a bit: http://blog.search-mojo.com/2010/11/10/live-from-pubcon-vega... . That person wrote up the discussion as "Question: What is Google doing to detect content farms?
Matt: Google historically has tried to do most everything algorithmically. blekko does allow you to identify content farms, but blekko is more human based response. Google is having an active debate about this. If you can’t algorithmically identify a content farm, is it still ok to take action and remove a site?"
The other relevant write-up was at http://www.seroundtable.com/archives/023229.html and they transcribed the discussion as
"5:22
Barry Schwartz: Q: Brian asked, what is google doing in terms of content farms?
5:22
Barry Schwartz: A: Matt fed this Q to Brian earlier ... hehhehe
5:24
Barry Schwartz: Tricky, Matt's team is in charge of web spam. If web spam doesn't last long in the index, what do they do? So a content farm is the bare min someone can do to get in to the index, but its borderline
5:24
Barry Schwartz: Some people in Google dont consider content farms as web spam
5:24
Barry Schwartz: They have been a little worried about people passing judgement on sites if it is a content farm a useful site.
5:24
Barry Schwartz: Think of Mahalo, Wikia, Blekko
5:24
Barry Schwartz: Those sites provide a curated experience
5:25
Barry Schwartz: It is a really interesting tension here, they don't want to bring Humans into the mix... They will let computers do it
5:25
Barry Schwartz: This is an active debate
5:25
Barry Schwartz: May Day, at least partially, was a first pass at this.
5:25
Barry Schwartz: If you can't algorithmically detect content farms, then do you take manual action?
5:25
Barry Schwartz: This is the problem they are thinking
5:26
Barry Schwartz: So if they do anything on this, they will update their guidelines
5:26
Barry Schwartz: This is an active debate in Google and we will see where we go
5:26
Barry Schwartz: Someone asked, Matt, what side are you on?
5:26
Brian Ussery (@beussery):
Matt says users are angry with content farms
5:26
Barry Schwartz: Matt said, users are not happy with content farms so he wants them out of the index."
Matt, thanks for responding. I guess I'm not exactly sure where to end this back and forth, but your latest response does open up a few questions of mine.
I've never argued that running Adsense helps a site rank higher, I know that it doesn't. But I do believe that sites like eHow, who presumably make Google millions of dollars a year, are given a free pass to pursue content spam like I posted in my previous comment without any sort of repercussions that we can see. They're leveraging their domain authority and producing very low quality articles to target obscure long tail variations of keywords to keep getting that traffic.
What concerns me is "If you can’t algorithmically identify a content farm, is it still ok to take action and remove a site"...is the issue that the algorithms aren't sophisticated enough to catch these content mills from spitting out article after article of low quality, long-tail targeted traffic, or that you guys have thrown in the towel and believe that if the algos aren't throwing flags, then the sites are fine?
I posted up an example of a content mill type situation in my last response. To most people, a manual review should throw up a warning flag if the goal was to identify people targeting keywords rather than trying to help people. The top 5 rankings for each of those pages shows that neither algorithmic nor manual measures are in place to deal with such a situation.
I have 10+ content sites targeting random niches. I know how the SEO game works. I know dozens of internet marketers who have dozens of their own sites each who know how to game the algo to rank high with low quality content sites like these. It's obvious people are taking advantage of the algorithm, but it doesn't appear to be drastically improving anytime soon.
I do appreciate the time and effort you have put into your responses. If you'd like to talk privately, I would love to. I'll try and watch the video you suggested tonight.
I wouldn't argue that just because current algos aren't throwing flags, the sites must be fine. We read TechCrunch and HN, hear the complaints, and see searches that we want to be better.
The challenge (in my mind, at least) is how to improve the algorithms more and when it's appropriate to say "This is low enough quality that it's actually spam, and thus we're willing to look at manual action." On the bright side, we've actually got a potential algorithm idea that we're exploring now.
I think identifying low quality content is important, yes. But the topic I've brought up is dealing with somewhat decent quality content (all of the guides do explain how to tie shoelaces) that are individually targeting subtle longtail keyword variations.
It's keyword variation content spam using hand written content and curated by very specific keyword data. So that seems to be a different algo trigger than a quality trigger.
With a push to a mobile first world the Android model is especially sensitive to spam. On a full size browser you have a lot more context and results for a given search. 5 Results may be spam, but you can work around them. If the average phone screen shows 3-5 results and all of them are spam you will quickly find alternate tools.
Google ignoring spam is like Microsoft ignoring the cloud.