In the previous lesson we discussed how crawler-based engines work. Typically, special crawler software visits your site and reads the source code of your pages. This process is called “crawling” or “spidering“. Your page is then compressed and put into the search engine’s repository which is called an “index“. This stage is referred to as “indexing”. Finally, when someone submits a query to the search engine, it pulls your page out of the index and gives it a rank among the other results it has found for this query. This is called “ranking”.
Usually for indexing, crawler-based engines consider many more factors than those they can find on your pages. Thus, before putting your page into an index, a crawler will look at how many other pages in the index are linking to yours, the text used in links that point to you, what the PageRank is of linking pages, whether the page is present in directories under related categories, etc. These “off-page” factors are a significant consideration when a page is evaluated by a crawler-based engine. While theoretically, you can artificially increase your page relevance for certain keywords by adjusting the corresponding areas of your HTML code, you have much less control over other pages in the Internet that are linking to you. Thus, off-page relevance prevails in the eyes of a crawler.
In this lesson, we look at the main spider-based search engines, and learn how we can get each of them to index our site and rank it highly. Although this step does not closely deal with the optimization process itself, we provide information on how each search engine looks at your pages so that you can come back to this section for later reference.

Related posts: