HTML5 and SEO: how the new syntax will influence search engine optimisation?

Later on 20th of January (or something like that), YouTube launched a fascinating experiment, introducing new video format for supported browsers, which rely on new HTML5 syntax.

HTML Evolution

It appears that the new standard, under development since 2007 (first draft was in 2008), is part of it.

Ok, it may be too earlier, but considering all the new browsers - Internet Explorer 8 included - has a good support for HTML 5, a new era is definitely coming.

The HTML5 specifics are almost complete, and it shouldn’t take longer before their release, with an arising need for all web developers to think on new approaches to build web pages.

Are coders the only ones to worry?

I guess no. HTML5 introduces a new way to implement and represent data, and the syntax is so different that everybody in the web industry will face with it (sooner or later).

From HTML 4 to HTML 5, a long story

HTML 4 is a fashioned meta coding language. It was firstly introduced in December 1997. One of the biggest limits of HTML 4 has always been the “browser”. Although the language has its own syntax, browsers are so “smart” to understand a non-well written code and render the page whichever the case.

To arginate the problem, the W3 Consortium two years later introduced an intermediate layer: xHTML, an hybrid between XML (well written documents) and HTML, which was meant to bring uniformity from a semantic prospective, forcing coders to write well-written HTML (e.g. enforcing some rules like tag closure and lower-case representation).

In 1999, web sites was almost static; they were not the media-rich and socially interactive like today and so nobody though to some specific tag for media embedding, nor to the most recent micro format to allow a computer to interpret the information with just a quick scan.

That is what HTML5 aim to do. HTML5 it’s a milestone and it will sign the beginning of the standardization of websites (I hope so). Code, from now on, will be divided into several – specific parts – and coders will be able to update web page faster, even those one they don’t originally wrote, making it easier to read for humans and bots alike.

How HTML 5 will affect SEO?

HTML 5 will allow for better cross browser compatibility between mobile, desktop, netbook, pda, e-readers and whatever else can display a web page.

The new HTML 5 mark-up will be more similar to the XML structure rather than HTML. Meta tags and HTML parts will should be easier to understand by bots, perhaps defying the meta over (ab)use; a new array of elements will be available for a specific document sections like the navigation menu rather than article part and so on.

Let’s have a look on what is going to change.

To do that I will borrow a couple of diagrams from an article on A List Apart. Here is how today a – typical – web page is represented:

Old HTML 4 Layout

Currently the structure of HTML tags is not semantic and not in any particular order – which makes it challenging a search engine to figure out what is actually important. This is, instead, the new format provided for HTML 5:

New HTML5 layout

The improved sectioning could ease search engines in understanding the page structure leaving the algorithm may more time to concentrate on relevant content. The same every person involved in the industry should do.

New HTML5 Tags

Here are some (not all) of the tags that will hold the most importance concerning SEO and the categorization of a page:

Article

It points to the most important content on the page. It lets spiders know about the topic you are talking. This could be a forum post, blog post, newspaper article, a user comment, or any other independent item of content.

Section

This tag specifies separate sections of an article. This means that (hopefully) a search engine will be able to pay attention and evaluate each section singularly accordingly to the header.

This tag holds the primary info about the content. It can be included more than once on a page, but as far as I understood just one per article. This should allow search engines to rank pages with multiple topics more easily.

It can be used multiple times like those that the header, and it should be the less important part of the page.

Aside

It contains any secondary info related the page, yet off topic argument. Just imagine the right (or left) column where related topics are generally listed.

Audio and Video

With these tags, you will be able to embed media content directly into the HTML pages, having an extra control over their appearance – nowadays impossible with third parties players.

Source

It is a child element of the audio and video tags, which allows to specify multiple alternative sources for the media element; it is particular useful for a browser that does not support all formats (e.g. Firefox can’t read Wav audio files). From a SEO prospective, it should be useful to facilitate the indexation and ranking of different assets at once.

So what?

SEOers can’t do pretty much except waiting. Once enough pages will implement HTML 5, search engines will inevitably start considering the new syntax. At that time, the mark-up of a page will become far more important to SEO than it is currently and so do the content.

This is not an overnight change, but it is extremely important for SEOers being up to date and ensure their web sites to respect xHTML standard for an easy migration process.