We have already stated in our Training Course that both Internet users and search engines want fresh, unique and quality content. Nevertheless online business practice often deals with repeating content on Web pages which has been created either accidentally or deliberately. That’s why duplicate content has become a huge topic of discussion not so far. This notion has spread on numerous cases and got its harming meaning thanks to the new filters that search engines have implemented.
Duplicate content generally refers to multiple versions of the same content that exist on different pages, either within one domain or across different domains. These blocks of similar information might either completely match each other, or appear to be appreciably similar.
Many webmasters and SEO/SEM experts often speculate about the percentage of the pages similarity, and try to predict the figures which may lead to duplicate content penalties. However there is hardly any distinct percentage as per defining which pages are absolute duplicates and may trigger the duplicate content filter, and which ones are slightly similar. In fact there’s more than simple direct comparison that goes into duplicate content detection: when comparing two similar pages, search engines consider other factors such as site authority, link popularity, domain age and others.
As Google state in their Webmasters/Site Owners Help, they identify the following types of non-malicious duplicate content:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of Web pages.
Besides, Google is already able to evaluate navigation panels, common header text, ads, footer text, and repetitive page links. These instances of duplicate content are not penalized but are ignored.
The other types of content which is deliberately duplicated across domains and is created to manipulate search engine rankings are considered malicious. These may include similar landing pages created to attract more visitors to your site, subdomains or domains with substantially duplicate content, and pages with stolen content.
In most cases, you are very unlikely to run a rick of being penalized for duplicate content in case you do not create it deliberately. However you should be armed with knowledge to make sure that you do not use malicious duplicate content and accidentally trigger the search engines’ filters.
Most webmasters have already learned that search engines do not like duplicate content. The problem is that multiple pages with the same content confuse SEs which aim to list the most relevant, unique and original results, not clutter. Thus, in an effort to provide more varied results to their users, search engines filter websites that appear too similar to each other: except for the most relevant results, similar results are excluded.
As stated in Google’s Webmasters/Site Owners Help, “Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a “regular” and “printer” version of each article, and neither of these is blocked in robots.txt or with a noindex meta tag, we’ll choose one of them to list.”
In other words, duplicate content filters are algorithms designed to compare one page against another. If the filter considers two or more pages to be substantially similar, they simply keep the most trusted one in the primary index while moving the others to the supplemental index.
The penalties may arise when you start copying hundreds or thousands of pages from other domains or create exact replicas of existing sites. Moreover, you run a risk of being penalized in case the ratio of Unique Content vs. Borrowed Content is too low on your site.
As you have understood search engines got hard task to exclude duplicate results from their indexes. Duplicate content appeared here and there as the result of many situations as article publishing, blogs posts, different URLs of the same site leading to one content. The contemporary trends demanded some help for the site owners especially those having ecommerce sites with several pages listing the same set of products.
That was why a nice idea to make new standard was realized by top search engines. They offered to implement new “canonical tag”:
example.com/product.php?item=product-name”/>
This tag is used in the head section of the page in order to give search engines the canonicalization suggestions. Thus specified “canonical” page http://www.example.com/product.php?item=product-name becomes the preferred version of a set of pages with highly similar content.
Canonical tag is useful in the case of multiple URLs pointing at the same page, but might also be used when multiple versions of a page exist. This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag. You can use relative or absolute links, but absolute links are recommended by search engines.
“To migrate to a completely different domain, permanent (301) redirects are more appropriate. Google currently will take canonicalization suggestions into account across subdomains (or within a domain), but not across domains. So site owners can specify a canonical page on www.example.com from a set of pages on example.com or help.example.com, but not on example-widgets.com.”
Search engines do hope this tag will help to regulate and facilitate the duplicate content question. If this tag can’t be implemented they’ll keep using algorithms designed to compare one page against another to determine the canonical.
