Have you been searching for an answer to the question “What is canonical?”? Are you looking to learn all you need to know about canonical tags and how to add canonical to avoid alarming duplicate content issues?
There are several cases today when different URLs lead to the same – or almost identical – webpages. But then, search engines have no mercy on duplicate content and tend to penalize them summarily. Webmasters may need to get in touch with search engines to inform them which version of the webpages is the primary or original one.
This is where canonical tags. These snippets inform search engines which particular version of a URL or webpage they consider as being the main – or – option.
This tags enable webmasters to avoid severe penalties from search engines. And they can also help you by saving you from having to implement highly complex redirects.
The tags are not all that new as they have been in use as far back as 2009. Microsoft, Google, and Yahoo unanimously create them for one reason: to provide webmasters with a unique way of solving duplicate content issues easily and quickly.
But what is canonical? Can one learn how to add canonical? Do they even work? Read on to find answers to these questions. And yes, it is possible to learn how to add canonical so that you can avoid duplicate content problems.
What is Canonical?
A rel=“canonical” tag refers to a snippet of HTML code that readily defines the main or primary version for similar, duplicate, and near-duplicate pages.
In other words, if you have similar content or the same content positioned under different URLs, the tags help to specify which of these versions is the main one that needs to be indexed.
Synonyms of canonical tags are canonical link, rel=“canonical” or rel canonical. So, expect these terms to be used interchangeably.
Why Duplicate Content Exists
Before discussing how to add this tag, it makes sense to find out why duplicate content exists in the first place.
No blogger or webmaster sets out to create or produce duplicate content within a single website or blog.
The issue of duplicate content usually occurs when content management systems (CMS) create multiple URLs. This is something that happens when:
- You launch a webpage
- You have alternate versions for different types of devices.
- You have different versions of your website that are very indexable.
- You make use of dynamic URLs.
For instance, let’s assume the URLs displayed below display similar or same content:
To a user, the URLs above display precisely the same content. But search engines will read them as 6 duplicate web pages:
- The first and second URLs are generated when the CMS saves product URLs and without the category name.
- The third, fourth, and fifth URLs indicate that the website is accessible on ‘HTTP’ and ‘HTTPS.’
- The sixth URL is the mobile-friendly version of the website that is located on a subdomain.
You may also discover that duplicate content exists across multiple URLs such as:
By now, you already know how incredibly easy it is for duplicate content to occur. Many websites today have issues without their webmasters realizing it.
But these tag URLs help search engines to readily identify the multiple variations of a particular webpage as a single URL.
How Canonical Tags Look Like
The tags use very simple and highly consistent syntax. They are usually placed within the <head> section of a webpage. For example:
<link rel=“canonical” href=“https://mywebsitesample.com/sample-page/” />
If you are somewhat confused, here is what the expression displayed above means in simple English language:
- link rel=“canonical”: The link in this particular tag is referred to as the master or this tag version of this webpage.
- href=“https://mywebsitesample.com/sample-page”: The canonical or main version can be found at this URL.
In most cases, visitors to your website will not see the rel canonical. A tag URL is defined within the page source or in the HTTP header. How it looks like within the page source has been outlined earlier.
The HTTP Header
Defining the rel tag in the HTTP header is frequently used whenever you need to set this tag URL on a non-HTML document such as a PDF, etc.
In the HTTP header, the canonical URL looks like this:
HTTP/1.1 200 OK
Date: Thu, 7 Jan 2017 11:54:25 GMT
Last-Modified: Fri, 8 Jan 2017 17:50:17 GMT
Link: <http://www.sample.com/downloads/whitepaper.pdf>; rel=“canonical”
One of the possible scenarios in which you may have to use the HTTP header to define the tag URL for non-HTL documents is when content is presented as a regular page – i.e. HTML document – and as a PDF, i.e. non-HTML document.
Take note that only Google currently supports defining the tag URL using the HTTP header. But for images, Google doesn’t support this tag defined via the HTTP header.
Why Canonical Tags are Important for SEO
Google hates duplicate content. And this is because the search engine finds it much harder to choose:
- which version of a particular webpage to rank for relevant queries.
- which version of a specific webpage to index, since the search engine only indexes one.
- Whether to consolidate what is known as “link equity” on 1 page or split it between different versions.
When you have too many duplicate contents, it can have a significant impact on your “crawl budget.” This implies that Google may waste a lot of time crawling different versions of the same webpage instead of discovering other vital content on your site.
This is why canonical tags are vitally important for search engine optimization.
What You Need to Know About Crawl Budget
You should never force Google to spend precious time crawling duplicate content. It is something that you should avoid as much as possible.
However, Google states that this is not a problem with most websites. According to the search engine giant, if new pages tend to be crawled the same day they are published, the crawl budget isn’t something website owners need to focus on.
Besides, if a website has fewer than several thousand URLs, it will be crawled efficiently most of the time.
Thanks to canonical tags, webmasters can solve all these issues. They allow you to inform Google which particular version of a web page they should index and then rank. They also tell the search engine where they need to consolidate link equity.
However, if you fail to choose or specify a rel canonical, Google will take up matters into their own hands. The search engine giant will identify what they think is the URL or the best version.
Of course, it is not a bad idea to rely on Google this way. The search engine may end up selecting a version of your webpage that you do not want to be canonical.
It is imperative to note that Google states that they generally respect rel canonical that you set, but this is not always the case. And that is because canonical URLs are only hints, not specific directives.
As long as those hints are respected, any signals – such as links, etc. – should easily consolidate to the canonical tags
How to Create a Canonical URL
You already know how canonical tags work and the issues it solves. So, how do you create canonical tags for your site? Here are the steps to follow:
- Make sure you quickly identify the multiple versions of your site within your search console homepage.
- Click the specific version of your website that you want.
- Click the gear icon, and then click on ‘Site Settings.’
- Within the preferred domain section, choose the website you want to be preferred.
That was simple, wasn’t it? Yes, it was.
Okay, let’s assume that you have at least two versions of the same webpage and they each bear precisely the same content.
The only difference is the fact that they usually appear in separate sections of your website. And this causes the active menu item and the background color to differ.
The first and second versions have equally been linked together from other websites. And this implies that the content is invaluable. Here is the question: which version of the webpage should be displayed by search engines or which one do you prefer to appear in the search engine results pages (SERPs)?
For instance, let’s say here are the URLs of the scenario painted above:
This is precisely why canonical tags were invented. This scenario happens relatively often in most cases in many of the eCommerce systems out there. A particular product can easily have multiple URLs, though this depends significantly on how you arrived there.
In this particular case, you can apply the canonical URL as follows:
Step 1: Pick one of the 2 webpages as the canonical version. Make sure it is the version that you think and believe is the most important of the two.
But if you do not care, go for any of the two webpages with the most visitors or links. Of course, you can go the old school way by flipping a coin. Just choose one page.
Step 2: Add a canonical URL from the non-canonical page to the canonical one. Let’s say you pick the shortest URL as the canonical tag. The other URL would link to the shortest URL within the <head> section of the page in the following way:
<link rel=“canonical” href=“http://mysample.com/homerun/seo-plan/” />
And that is it; you just learned how to add canonical, nothing more, nothing less. This action ‘merges’ these two webpages into one from a search engine’s perspective.
It is a “soft redirect” without really redirecting any user. Links to both URLs have now counted as one or a single canonical version of the URL.
301 Redirects vs. Canonical URLs: What is the Difference?
Some webmasters may want to set up a 30 redirect instead of a rel canonical. What a 301 redirect does is automatically send website visitors to a new URL when those visitors click an old link.
In most cases, webmasters use a 301 redirect to update their webpages or URL or consolidate the content into their archives. However, if they still want their website visitors to access the page, whether the content is duplicated or not, they need a rel canonical.
Common Canonicalization Mistakes You Should Avoid
Canonicalization is a tricky subject to most webmasters. And as such, there are lots of misconceptions and misunderstandings about how to add canonical correctly.
Here are some of the common canonicalization mistakes you should avoid:
- Setting a Canonicalized URL to ‘noindex’
You should never mix canonical tags and noindex. This is because they are contradictory instructions.
Google generally prioritizes the rel canonical over the ‘noindex’ tag. But if you want to canonicalize a URL and noindex, you should use a 301 redirect instead. Otherwise, only make use of canonical tags.
- Blocking Rel Canonical via Robots.Txt
If you block a URL within robots.txt, Google will no longer be able to crawl it. Yes, Google crawls URLs, not webpages. And if Google cannot crawl the URL, it is prevented from seeing any canonical tag on that particular page.
And this, in turn, prevents the search engine from transferring any link equity from the non-canonical to the canonical.
- Having Multiple Canonical Tags
If you have multiple canonical tags, Google may choose to ignore them all. And in many cases, this is the scenario that occurs since tags are usually inserted into a system at different points – such as the theme, the CMS, and plugin(s).
This is why many plugins come with the overwrite option, ensuring that they remain the only source for rel canonical.
Canonicalization is a vital concept that most webmasters need to understand, especially when it comes to SEO. By now, it is assumed you have answers to the question: what is canonical? And you have learned how to add canonical if you discover that you have duplicate content. Therefore, use this powerful snippet, and you will start noticing the positive changes on your website.