How Google (and other SEs) crawl and ranks PDF files?

There is one common thing that link web 2.0 users: the necessity to take part of this new fantastic world contributing to it, inserting web pages, picture, documents, comments and so on.


So, it’s not rare to see emerging web sites containing tons of new material rather than a forum at the top of the SERPs. And it’s not rare to see different type of documents rather than a standard web page. Documents like a Word file, Power point presentation, PDF etc.

Search engines love text

If you write a document with some special formatting, that doesn’t fit well into a web page or contains some graphics that must be preserved, you can publish over the web converting it to a PDF file, and let it accessible to the entire world.

Search engines are smart enough to crawl your web pages and index (normally) all the link and documents contained. Google started to index PDF documents later in 2001 so they are not completely new to this kind of stuff, but recently they enhanced the quality and the user experience introducing the ”Quick View” PDFs feature.

The reason why Google developed it was due to poor quality of the “View as HTML” feature, originally developed to “translate” a file into a document readable directly into the browser (unless searchers weren’t interested in opening it into different applications after downloading it).

Unfortunately the “View as HTML” feature isn’t perfect and often the layout proposed doesn’t respect the original one. These kind of problems no longer exist thanks to “Quick View” which has changed its approach to opening PDF files, opening the documents directly into the browser whilst keeping the formatting intact. Whether it is well formatted or not, a PDF document should be optimized before being shown in the SERP.

Having looked into research papers, that contained possible indicators on how to properly optimize a PDF document, I was unable to find anything [really] useful.

As far as I’m aware (are you?) Google is particularly interested in details and quality, I decided to spend some time to create a test case to evaluate many different combinations of the same PDF document to understand which factors really influences the PDF ranking.

The test has been published on a brand new domain, so I will be able to appreciate more genuine results. To stay updated about this test, point your web browser to my PDF ranking and indexing test.

UPDATE: Check out my final conclusion on the SEO PDF Test