Not many days ago, Google officially introduced the support of the meta tag canonical via the HTTP headers. Now, for who doesn’t know what a canonical tag is, in a very short sentence it is an HTML tag to allow webmaster specify which URL of a page to store in their index.
Why using the meta tag rel=canonical is important?
Well there could be different reasons, but the most important is of course to avoid duplicate content, a topic on which Google recently is hammering quite a lot.
However, before this change, due to technical limitation the meta tag usage was restricted to the web site page made purely in HTML. But we all know that content can be also a media file like a video or a PDF file. And how many times did you see the same file in a web site hosted more than once due to mistakes?
I’ve personally saw this quite a lot, especially for those companies which aim publishing their time-tables, publishing both as an HTML file and a PDF file without adding any extra value to their choice.
From now onward, “hacking” the HTTP header returned from the server for those file, it would be able to specify which version of the document need to be indexed.
Duplicate content is bad
Duplicate content is quite a big issue on the web, and search engines are dramatically removing from their index duplicate resources to offer a better experience of the web.
In more than a chance I’ve come across customers who absolutely want their content published also in PDF. They believe adding a couple of nice (I would say ridiculous) pictures to their word document make them the best designer in the world.
As a consequence, in order to make those (often irritating) customers happy, you have to be compliant with them generating issues on their site.
By implementing the rel=canonical at HTTP request level is a great chance to make both parties happy.
At the moment there is any evidence Bing is going to support the tag, so bear in mind that despite this is a solution, Bing will continue to consider your documents duplicated.
How can I implement the rel canonical tag via the HTTP headers?
Implementing the canonical tag can vary according to the hosting platform. In a *NIX environment is of course easier as you ultimately need to change a text configuration file accessible even via FTP.
As this implementation requires you to be very specific while targeting the files, an example could be
Also, be aware that any change to the .htaccess file can seriously compromised the whole web site visibility if you do a mistake.