An “ index ” is generally an ordered directory, also a register, in a reference work (e.g. lexicon, telephone books, etc..). The so-called “ Google Index ” is the entirety of all websites recognized, ie crawled, and stored (= indexed) by Google. The SERPs are filled exclusively with pages from the index – a page that is not in the index will not be in the SERPs either.
According to WHICHEVERHEALTH, the Google Index is not static, as in a lexicon, but highly dynamic. New websites are added, some are removed. New pages are recorded by crawlers jumping from link to link. If a website massively violates the Google guidelines, a website is removed from the index and thus from the SERPs.
In addition, the Google Index has a complex structure. This means that it is not only structured alphabetically, but various ranking criteria are placed over the index in order to deliver a certain set of websites in a certain order for a search query. This also happens dynamically, as websites and the ranking criteria are constantly changing. How this works exactly is Google’s trade secret.
This also applies to all other search engines. Often the term “indexed” is used for “indexed” in the internet, but this is incorrect.
How Google populates the index
Google enriches its index with the help of crawlers (also called bots). A crawler jumps from one link to another and subsequently hits websites that are linked to one another. Every new website is crawled, ie the source code is read out and sent to the index. There the page is sorted according to various ranking factors and other rules. If you want your website to be indexed in this way, you have to make sure that it gets a link from another website. Then a crawler has to stop by the foreign website and discover the link to our site. This possibility of indexing is quite tedious and insecure.
Have a website indexed
In order to have your site actively indexed, you can “send” your website directly to Google. There are three ways to do this:
- a) At http://www.google.de/addurl/you can submit an application for indexing for a website. However, successful transmission of the data is no guarantee of inclusion. You also need to have a Google account to access this service. If you already have a Google account, it makes more sense to use another option:
- b) In the Webmaster Tools(also called “Search Console”), you can send a sitemap directly to Google.
A sitemap can be created quite easily in.xml format, there are many free services on the internet. This.xml file is added in the Webmaster Tools under “Sitemaps”:
After a while, usually within the next 24 hours, Google will crawl the URLs given in the sitemap. The progress of the indexing can be followed in the Webmaster Tools under “Sitemaps”:
- c) If you want to have a single page indexed, for example, because it was added after the sitemap was created, you have the option of adding a single URL to the index in the Webmaster Tools under “Crawling” → “Retrieval as by Google” send.
Actually, this option is there to check whether the crawler can see and understand all resources on a page (Java Script, etc.). But after you have sent the URL, Google offers the option of sending the URL “to the index”.
However, this only sends the one URL and any pages of the domain linked to this URL to the index. Each account has a monthly quota of 10 URLs that can be sent to the index.
There can be several reasons why a webmaster wants to prevent his page from appearing in the SERPs and thus from being available in the Google index.
– The page is not yet ready or is being relaunched and should not be found until completion.
– There are copyright or data protection reasons not to make the page publicly available.
– Sometimes a webmaster does not want to make individual subpages public, e.g. admin access or inferior pages.
– The website is intended for private use only.
There are several ways to prevent indexing
- a) Meta tag “noindex”
With the meta tag “noindex” the crawler is instructed not to index the page:
<meta name = ”robots” content = ”noindex” />
While most search engine crawlers adhere to it, the noindex tag is just a directive.
- b) Lock out crawlers with robots.txt
In the robots.txt you add the following code to block all pages of a domain for all access:
User agent: *
If you only want to exclude individual subfolders, the whole thing looks like this:
User agent: *
Disallow: / subfolder1
Disallow: / subfolder2 / subfolder /
- c) Exclude crawlers via.htaccess
With.htaccess you can set password protection for the entire website or for individual areas of the page. This option is also recommended by Google: Block URLs through password-protected server directories (https://support.google.com/webmasters/answer/93708?hl=de)
An automatic.htaccess generator can be found here: http://www.homepage-kosten.de/htaccess.php
Detailed instructions can be found here: http://www.grammiweb.de/anleitungen/ka_htaccess.shtml
Fly out of the index – and come back in
Getting out of the index involuntarily is very annoying. In order to be really de-indexed, you have to get into debt – like massive, broad-based, spam link building, or large-scale cloaking. But also negative SEO, i.e. if competitors or hackers take SEO-technical action against your own site, can be a reason to be thrown out of the index.
– The first thing to do is to send a re- entry request (reinclusion request or reconsideration request) to Google.
– If you screwed up yourself, you have to prove that you made it up again.
– If you have been the victim of an attack, you should be able to prove that too.
In the Webmaster Tools you can now submit an “Request for a new review” which can be found in the message about the punishment.
Now it can take a while, from experience between 2 and 12 weeks, until the page is back in the index.