Lesson (17): Building the Right Site Architecture

Let’s first define “site architecture”. In terms of SEO / SEM, it refers to the entire framework that supports your website content and thus defines the way search engine spiders index it. Site architecture consists of the navigation structure of your website, the page layout and the structure of various elements on your page, your file and directory system and the types of files you use.

Search engine rankings are impacted by site architecture as long as it defines which pages will and will not be indexed. To make the architecture SE compatible, you should always have unique and relevant content (as we already know), use the design elements that are spider-compatible and use navigation and linking structure that encourages regular indexing by search engines.

The elements of site architecture that influence your rankings include the file and directory system, file names and extensions, navigation menus on your pages, entry points / pages (e.g. landing pages), robots META Tag (or, alternatively, the robots.txt file), error pages, site maps, includes (Server Side Includes or SSI, if you need to find out whether you’re using them, search for the files with .shtml or . shtm extension), introduction pages and dynamic content.

File and folder structure

It’s a good idea to have SEO in mind from the moment you start creating your site. When you are building a site from scratch, normally you’d first create the logical structure, how users will see and operate the site from the outside; this scheme then defines the physical structure, i.e. how files and directories are placed on the hosting server.

Very often, there isn’t an opportunity to build a brand new file system and you are forced to redesign an existing structure.

In this case, it’s useful to have a broken link checker at hand, the one similar to the Web CEO Site Quality Auditor, when you’re starting to make changes to your directory structure.

So let’s get down to business. It is believed that most search engines only index websites to a depth of 2 levels, or a maximum of 50-60 files. This is usually the case, although some advanced spiders (like Google) can dive 4 levels deep away from your home directory or the page where they’ve started. Therefore, try to keep important content that you want indexed in the top two directory levels of your site, i.e. www.yoursite.com/level1/level2/page.htm.

As a general rule, pages closest to the root directory are considered the most important pages on your site by the search engine spiders. The two most important files residing here should be the home page, commonly named index and having different extensions depending on the technology your hosting server uses; and the Robots Exclusion Protocol, commonly named robots.txt.

For large sites, around 100 to 200 pages should be kept in the root directory, for smaller sites, it’s best to keep all pages under the root. While trying to place your most important pages at the first or second directory level, break it up into 50 files per directory. On a larger site (250-plus pages), a strategy to organize content-related files into separate directories may be considered proper. Be sure to name your files and directories with your keywords. To separate keywords in the directory names, use hyphens. Don’t stuff too many keywords in your file or directory names. Make them keyword rich but not too long.

Related posts:

  1. Lesson (17): Building the Right Site Architecture (2) The problem called dynamic URLs As a rule, search engines do not have trouble with scanning dynamic URLs like http://www.yoursite.com/gallery.php?category=widgets&color=red&price=20 However Google‘s official terms claim that dynamically generated web pages, including .asp pages, .php pages, and pages with question marks in their URLs can cause problems for their crawler and may be ignored. That’s why [...]...
  2. Step 3: Optimizing Site Structure Let’s start our introduction into the site structure optimization by quoting Shari Thurow, the author of “Search Engine Visibility“, a renowned search engine expert with great experience in building websites for search engines. Shari says, “Web site architecture is something I feel has been poorly addressed by search engine marketers. Reason? Many search engine marketers [...]...
  3. Lesson (19): META Robots Tag and “robots.txt” Robots There are two ways you can restrict a spider from crawling all or part of your site. First is by placing the META Robots tag within the “head” section of your HTML file (making it effective only for the pages where the tag is inserted). The second is to write a special instruction file [...]...
  4. 10 Rules on using Robots Exclusion Protocol The “robots.txt” file must always be named in lowercase, even if your site is hosted on a case-insensitive platform like Windows (e.g. “Robots.txt” or “robots.Txt” is incorrect). Wildcards are not supported in both the fields. “*” can only be used in the User-agent field command syntax to denote “all”. Googlebot is the only robot that [...]...
  5. Lesson (18): Domain Name, File Names and Extensions Your domain name, the URLs of your pages and your title tag are some of the influential factors for your search engine rankings. Keeping this knowledge in mind, you can have advantage over your competitors in a race towards high search engine rankings. Keywords in domain names and file names There’s a widespread opinion that [...]...



Leave a Reply