Let’s first define “site architecture”. In terms of SEO / SEM, it refers to the entire framework that supports your website content and thus defines the way search engine spiders index it. Site architecture consists of the navigation structure of your website, the page layout and the structure of various elements on your page, your file and directory system and the types of files you use.
Search engine rankings are impacted by site architecture as long as it defines which pages will and will not be indexed. To make the architecture SE compatible, you should always have unique and relevant content (as we already know), use the design elements that are spider-compatible and use navigation and linking structure that encourages regular indexing by search engines.
The elements of site architecture that influence your rankings include the file and directory system, file names and extensions, navigation menus on your pages, entry points / pages (e.g. landing pages), robots META Tag (or, alternatively, the robots.txt file), error pages, site maps, includes (Server Side Includes or SSI, if you need to find out whether you’re using them, search for the files with .shtml or . shtm extension), introduction pages and dynamic content.
File and folder structure
It’s a good idea to have SEO in mind from the moment you start creating your site. When you are building a site from scratch, normally you’d first create the logical structure, how users will see and operate the site from the outside; this scheme then defines the physical structure, i.e. how files and directories are placed on the hosting server.
Very often, there isn’t an opportunity to build a brand new file system and you are forced to redesign an existing structure.
In this case, it’s useful to have a broken link checker at hand, the one similar to the Web CEO Site Quality Auditor, when you’re starting to make changes to your directory structure.
So let’s get down to business. It is believed that most search engines only index websites to a depth of 2 levels, or a maximum of 50-60 files. This is usually the case, although some advanced spiders (like Google) can dive 4 levels deep away from your home directory or the page where they’ve started. Therefore, try to keep important content that you want indexed in the top two directory levels of your site, i.e. www.yoursite.com/level1/level2/page.htm.
As a general rule, pages closest to the root directory are considered the most important pages on your site by the search engine spiders. The two most important files residing here should be the home page, commonly named index and having different extensions depending on the technology your hosting server uses; and the Robots Exclusion Protocol, commonly named robots.txt.
For large sites, around 100 to 200 pages should be kept in the root directory, for smaller sites, it’s best to keep all pages under the root. While trying to place your most important pages at the first or second directory level, break it up into 50 files per directory. On a larger site (250-plus pages), a strategy to organize content-related files into separate directories may be considered proper. Be sure to name your files and directories with your keywords. To separate keywords in the directory names, use hyphens. Don’t stuff too many keywords in your file or directory names. Make them keyword rich but not too long.

Related posts: