Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I understand why the author thinks this is a good idea, but 404 exists for a reason. Literally "Not Found", not "I will guess at what you are looking for".

The "implications for SEO" will almost certainly not be "positive" for quite a few reasons. As a general rule in my experience and learning, Google's black box algorithm doesn't like anything like this, and I expect you will be penalized for it.

There are many good comments already and my suggestions would merely be repeating them, so, just adding my voice that this is likely to be a bad idea. Far, far, better to simply have a useful 404 page.

Edit just to add: if you do something like this, make sure you have a valid sitemap and are using canonical tags.



This is absolutely right. The duplicate content flags alone will cancel everything out.

I don't understand why people don't just read Google's own published guidelines ( https://developers.google.com/search/docs/fundamentals/seo-s...) on how to properly SEO one's site.

It's not some dark art. Google themselves tell you exactly how they index and why. All you have to do is read the guidelines.


In that document they explain that duplicate content is not a big deal: just set a canonical, and respond maybe with a 301.

I did this during a migration of a site that used '/<pk-int>/' and was changed to '/<slug>/' with corresponding 301. The SEO not only didn't punish the migration, but it seemed to like the change (except the Bingbot, that after five years still request the old Urls).

The problem I see with the OP strategy is that the bot can hit a Url that still doesn't exist, get 301'ed with that technique, and the Url is reused afterwards for new content: following their example, somebody links to '/supplements/otherbrand-spore-probiotic', that gets a bot following that link 301'ed to '/supplements/spore-probiotic'. Later you actually add 'otherbrand-spore-probiotic', but that Url will never be visited by the bot.


What about using 302?


What about 308?


I mostly agree, but also... it kinda is a dark art? I just read the "Reduce duplicate content" section [1] because my gut reaction was "yeah, I agree, this guy is going to get SEO-penalized for this" but of course I don't actually know. And although my few years of experience dealing with this stuff still want to say "you will get SEO-penalized for this", Google's own guidelines do not: "If you're feeling adventurous, it's worth figuring out if you can specify a canonical version for your pages. But if you don't canonicalize your URLs yourself, Google will try to automatically do it for you. [...] again, don't worry too much about this; search engines can generally figure this out for you on their own most of the time. "

[1] https://developers.google.com/search/docs/fundamentals/seo-s...


> It's not some dark art. Google themselves tell you exactly how they index and why. All you have to do is read the guidelines.

This reminds me of something similar in the mobile world: Apple's App Review guidelines[1]. They're surprisingly clear and to the point. All you have to do is read and follow them. Yet, when I worked for an App developer, the product managers would act as though getting through App Store review was some kind of dark wizardry that nobody understood. Them: "Hey, we need to add Feature X [which is clearly against Apple's guidelines]." Me: "We're going to have trouble getting through app review if we add that." Them: "Nobody knows how app review works. It's a black box. There's no rhyme or reason. It's a maze. Just add the feature and we'll roll the dice as usual!" Me: "OK. I've added the feature and submitted it." App gets rejected. Them: "Shocked Pikachu!"

1: https://developer.apple.com/app-store/review/guidelines/


While these are clear and to the point, they are absolutely not an exhaustive list of rejection reasons.

Going through any forum thread of "reasons you've been rejected" and trying to find the reasons here isn't obvious.

I couldn't find anything in there pertaining to why we were rejected last month (mentioned Android in our patch notes as it's a third party device we integrate with).


I haven't work with iOS in a few years, so perhaps they are more consistent these days, but in the early days I built a web browser with a custom rendering engine to see if I could get it through review and it was accepted just fine. It was accepted just fine through several releases, in fact. According to the guidelines at the time, alternative browser engines (i.e. not WebKit) were prohibited. It should have been rejected if the guidelines were followed. But they weren't. It really was a dark art figuring out when and when they wouldn't follow them.

I'm with your colleague on this, at least in a historical context.


Doing something not mentioned by the guidelines and hoping it gets through, I can understand. Doing something explicitly prohibited by the guidelines and hoping it gets through seems like an unwise use of resources.


There seemed to be no illusions around in the risk of the gamble in the OP's case. The comment states it was discussed. The consensus simply felt the gamble was worthwhile – even if they ultimately lost.

I was also well aware of and accepting of the risk in my case. I wanted to learn more about certain iOS features, and I was able to do so scratching an itch I felt like scratching. I would have still gained from the process even if it had been rejected. However, as it got approved, contrary to the guidelines, I also got a good laugh at how inconsistent review was and a nice cash bonus on top!

You win some you lose some. Such is life.


If you look at the recently leaked documents Google often do not tell the truth, or the whole truth.

301s are what you are supposed to do with things like changed URLs, so I cannot see it being a problem in this case. A 301 is nor duplicate content - it is one of the things Google likes you do do to avoid duplicate content.

It would probably be good to add a threshold to the similarity to prevent urls redirecting to something very different.


I would use 308/307 over 301/302.

Semantically, using a 303 redirect might be the most appropriate signal for what they are doing.

They could redirect to a specific result page if it's a clear, unambiguous match and they could redirect to a search results page if there are other possible matches.


If the pages redirect and have a canonical irl, there should be no problem, right?


There is always a possibility of a problem, even if you do everything right. If you are redirecting multiple urls to one page, for example, you may believe you are helping visitors reach the most relevant/helpful result but it could also look like you are trying to artificially boost that one page?

Then there are the vagueries of a search engine becoming ever more dependent on ai where not everything makes rational sense and sites can get de-indexed at the drop of a hat.


Do you have any evidence that redirects are bad for SEO? Redirects are a normal part of the web and Google tells you they are fine with it https://developers.google.com/search/docs/crawling-indexing/...


No, Google are infallible and never tell lies...


Except there's no duplicate content - the site redirects


The leaked documents are probably more useful than any "official" guide.


Agreed. You could make a very useful 404 page with the list of candidate URLs being used to do the redirect: It's exactly the thing you want to display in a "Did you mean x?" message on that page.


Yes, I think this is a much better implementation of the idea.


I hate websites that redirect URLs that should be 404s to something else like their homepage or whatever they think is relevant. My brain has to slam on the brakes and make sense of the absolute dissonance between what I expected and what I'm seeing, which is taxing.

A 404 doesn't cause nearly as much dissonance, because the website is at least telling me why what I'm seeing is not what I expected.


Yes, especially when you multi-search and open links in background, only to see a bunch of homepages later and no way to restore search context to try searching it deeper. And then there are “special” sites that redirect to google.com.


The last SaaS application I worked on was having issues similar to what you are saying. Too many distinct urls were arriving at the same page content and Google thinks this is you trying to do something naughty unless they all contain metadata for the same canonical URL, which we weren't handling right.

Not only did it improve our page rank once we fixed it, it also reduced the amount of bot traffic we were fielding.


Why stop at fuzzy matching? Just pass the original page to ChatGPT, telling it to generate new content and Google will think you are producing loads of additional pages… no need to worry about duplicate warnings at all :-/


I don't understand why you think this is a bad idea - 301 redirects exist for a reason. Redirecting to what the user probably wants is better than a 404 page.

That said, typically most people are going to find a site via search or clicking a link, which is why I think basically no one bothers doing this.


301 is "Moved Permanently". The content in question wasn't (necessarily) moved. If you control and know for sure that old URL was renamed to something else, then yes - 301 redirect is appropriate. But guessing and redirecting isn't that.


How is this going to affect Google's indexing of the site? They hit the front page and follow links, right? They don't try random perturbations of the URLs and see if they resolve to something else? Or do they?


No, but Google will find and spider the URLs found from at least two sources:

1) Users may (inevitably will) share links to the URLs on social media / wherever, and Googlebot et al will find those links and spider them.

2) The URLs will be sent to Google for any Chrome user with "Make searches & browsing better" setting enabled.

https://support.google.com/chrome/answer/13844634?#make_sear...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: