Lesson (2): Crawler-Based Search Engines

In the previous lesson we discussed how crawler-based engines work. Typically, special crawler software visits your site and reads the source code of your pages. This process is called “crawling” or “spidering“. Your page is then compressed and put into the search engine’s repository which is called an “index“. This stage is referred to as “indexing”. Finally, when someone submits a query to the search engine, it pulls your page out of the index and gives it a rank among the other results it has found for this query. This is called “ranking”.

Usually for indexing, crawler-based engines consider many more factors than those they can find on your pages. Thus, before putting your page into an index, a crawler will look at how many other pages in the index are linking to yours, the text used in links that point to you, what the PageRank is of linking pages, whether the page is present in directories under related categories, etc. These “off-page” factors are a significant consideration when a page is evaluated by a crawler-based engine. While theoretically, you can artificially increase your page relevance for certain keywords by adjusting the corresponding areas of your HTML code, you have much less control over other pages in the Internet that are linking to you. Thus, off-page relevance prevails in the eyes of a crawler.

In this lesson, we look at the main spider-based search engines, and learn how we can get each of them to index our site and rank it highly. Although this step does not closely deal with the optimization process itself, we provide information on how each search engine looks at your pages so that you can come back to this section for later reference.

Related posts:

  1. Lesson (20): Creating a Search Engine Friendly Sitemap What is a Sitemap? Sitemaps are often ignored by webmasters. Their value for both visitor-targeted and spider-targeted optimization is underestimated. What is a
  2. Lesson (19): META Robots Tag and “robots.txt” Robots There are two ways you can restrict a spider from crawling all or part of your site. First is by placing the META Robots tag within the “head
  3. Lesson (17): Building the Right Site Architecture (2) The problem called dynamic URLs As a rule, search engines do not have trouble with scanning dynamic URLs like http://www.yoursite.com/gallery.php?category=widge
  4. Lesson (17): Building the Right Site Architecture Let’s first define “site architecture”. In terms of SEO / SEM, it refers to the entire framework that supports your website content and thus d
  5. Step 3: Optimizing Site Structure Let’s start our introduction into the site structure optimization by quoting Shari Thurow, the author of “Search Engine Visibility“, a renowne



Leave a Reply