Lesson (2): Crawler-Based Search Engines

In the previous lesson we discussed how crawler-based engines work. Typically, special crawler software visits your site and reads the source code of your pages. This process is called “crawling” or “spidering“. Your page is then compressed and put into the search engine’s repository which is called an “index“. This stage is referred to as “indexing”. Finally, when someone submits a query to the search engine, it pulls your page out of the index and gives it a rank among the other results it has found for this query. This is called “ranking”.

Usually for indexing, crawler-based engines consider many more factors than those they can find on your pages. Thus, before putting your page into an index, a crawler will look at how many other pages in the index are linking to yours, the text used in links that point to you, what the PageRank is of linking pages, whether the page is present in directories under related categories, etc. These “off-page” factors are a significant consideration when a page is evaluated by a crawler-based engine. While theoretically, you can artificially increase your page relevance for certain keywords by adjusting the corresponding areas of your HTML code, you have much less control over other pages in the Internet that are linking to you. Thus, off-page relevance prevails in the eyes of a crawler.

In this lesson, we look at the main spider-based search engines, and learn how we can get each of them to index our site and rank it highly. Although this step does not closely deal with the optimization process itself, we provide information on how each search engine looks at your pages so that you can come back to this section for later reference.

Related posts:

  1. Lesson (20): Creating a Search Engine Friendly Sitemap What is a Sitemap? Sitemaps are often ignored by webmasters. Their value for both visitor-targeted and spider-targeted optimization is underestimated. What is a sitemap? In the most general terms, it’s a page or pages that contain a list of and link to all the other documents on your site. Theoretically, it’s designed to give your [...]...
  2. Lesson (19): META Robots Tag and “robots.txt” Robots There are two ways you can restrict a spider from crawling all or part of your site. First is by placing the META Robots tag within the “head” section of your HTML file (making it effective only for the pages where the tag is inserted). The second is to write a special instruction file [...]...
  3. Lesson (17): Building the Right Site Architecture (2) The problem called dynamic URLs As a rule, search engines do not have trouble with scanning dynamic URLs like http://www.yoursite.com/gallery.php?category=widgets&color=red&price=20 However Google‘s official terms claim that dynamically generated web pages, including .asp pages, .php pages, and pages with question marks in their URLs can cause problems for their crawler and may be ignored. That’s why [...]...
  4. Lesson (17): Building the Right Site Architecture Let’s first define “site architecture”. In terms of SEO / SEM, it refers to the entire framework that supports your website content and thus defines the way search engine spiders index it. Site architecture consists of the navigation structure of your website, the page layout and the structure of various elements on your page, your [...]...
  5. Step 4: Website Submission Search Engine Submission is a term that is sometimes defined as getting a website listed in a search engine. Nowadays, submission is even less than that – it’s just an application to include your site in a search engine index. This application may or may not be accepted by the search engine. Let’s imagine that [...]...



Leave a Reply