Supplemental index for dummies

I noticed that the subject of supplemental index of Google gets a lot of attention lately, and that many pieces of information are scattered around. So I decided to tidy things up a bit for you, as well as for myself for later reference. So, here it goes:

What is the Google supplemental index?

Supplemental index is the place where Google puts pages that have low quality (in Google's opinion of course). These pages are believed to have the tendency to appear deep in the search results (with some exceptions). This is not a keyword dependent attribute, which means that if a certain page is in the supplemental index, it is there for any searchable term.

Finding pages in the supplemental index

So, how to find pages in the supplemental index?
Until recently the way to know that a certain page is in the supplemental index was a simple indicator near the search results:

Supplemental result indicator example

All you had to do was to dig deep enough into the search results, and you'd finally stumble upon supplemental pages. However, a few weeks back Google decided to remove the indicator. Don't worry though - a different, "smart ass" method remains to detect pages in supplemental index. The following command brings the list of all indexed pages for a site (including supplemental ones):

site:www.somesite.com

Whereas the following command brings out only pages that reside in the main "good" index:

site:www.somesite.com/*

So, one step further, you subtract one from the other, and what do you get? Walla! The list of pages that reside in the supplemental index.

site:www.somesite.com/ -site:www.somesite.com/*

Yes, it's that simple. You might have to click on the Omitted Results link beneath the first few results, to see the whole list. In addition, don't use space between the minus (-) character and the second occurrence of the site: command.

What causes pages to be indexed in the supplemental index?

If a page goes into supplemental index, it means that something triggered Google to decide that it's not a high quality page. Reasons vary, however the following ones are commonly agreed upon:

  • Duplicate content. If few pages (under the same website, or under different ones) hold the same content, eventually all but one will go into the supplemental index.
  • Not enough link power (compared to other pages under the same site). This situation happens a lot with big websites. Pages that reside deep in the site structure (typically third level and down) tend to go supplemental, if there's not enough link power coming from stronger pages up the hierarchy.

Getting out of the supplemental index

If pages deep in your site hierarchy got into the supplemental index, then you need to pass them more link power.

  • Dan Thies wrote a comprehensive article about how to use no-follow to correctly direct link power between the different levels in your site hierarchy. No-follow is a simple HTML attribute which can be used on links. It basically tells search engines not to use a link to pass link power - however it doesn't prevent indexing the target page. No-follow is transparent to human users. Using this attribute you can prevent link power from being wasted on unimportant pages, and direct it towards the more important areas of your site.
  • In addition, by just strengthening your site as a whole, you'll make more link power pass onto the deeper pages, thus making them stronger, which may result in popping up to the main Google index.

If the problem is duplicate content, you want to make sure every page under your site holds unique content. Common reasons for duplicate content under same site are:

  • Versioning - some content management systems support different versions of same page, and sometime all versions get indexed. You want to prevent that from happening by blocking search bots from reaching all versions.
  • Printable versions of articles - some sites have the functionality to display pages in a manner easy-to-print. Sometimes these pages get indexed - you want to prevent that from happening.

And finally, evil competitors might hijack your content, thus making it duplicate content. The simplest way to detect such cases is by searching for random long-enough sentences from your content in exact form (in quotes). Usually the only result would be your page containing that sentence. However if people copied your content, they're going to pop up in the results next to you. However, this kind of detection method might become too time consuming for large sites. Solutions like copyscape.com automate this process and send you automatic alerts once a duplicate content case is detected online.

Hope this stuff helps, and good luck on your way up to main index

Great article

Although I am not sure about the exactness of the statement "Supplemental index is the place where Google puts pages that have low quality".

Roni


Duplicate Content

*** duplicate content ***

You missed out about a dozen sub-topics that come under the banner of "Duplicate Content".

It is a huge area of problems mainly caused by poor desicions made at the design stage of a site.