SEO & GEO·2 min read

How do Google (and other SEs) crawl and rank PDF files?

There is one common thing that links Web 2.0 users: the need to take part in this fantastic new world by contributing to it — inserting web pages, pictures, documents, comments and so on.

pdfrankingtest.jpg

So, it's not rare to see emerging websites containing tons of new material, rather than a forum, at the top of the SERPs. And it's not rare to see different types of documents instead of a standard web page — documents like a Word file, a PowerPoint presentation, a PDF, etc.

Search engines love text

If you write a document with some special formatting that doesn't fit well into a web page, or that contains graphics which must be preserved, you can convert it into a PDF file and make it accessible to the entire world.

Search engines crawl your web pages and index all the links they contain. Similarly, Google started indexing PDF documents back in 2001, so PDFs are not new to it, but more recently they have enhanced the quality and the user experience by introducing the "Quick View" feature for PDFs.

My guess as to why Google developed this feature comes down to the poor quality of the "View as HTML" feature, originally developed to "translate" a file into a document readable directly in the browser (for users who weren't interested in opening it in a different application after downloading it).

Unfortunately, the "View as HTML" feature isn't perfect and the layout it produces often doesn't match the original. These kinds of problems no longer exist thanks to "Quick View", which has changed the approach to opening PDF files, displaying the documents directly in the browser while keeping the formatting intact. Whether it is well formatted or not, a PDF document should be optimised before being shown in the SERP.

Having looked into research papers that contained possible indicators on how to properly optimise a PDF document, I was unable to find anything [really] useful.

Since Google is particularly interested in details and quality, I decided to spend some time creating a test case to evaluate many different combinations of the same PDF document, in order to understand which factors really influence PDF rankings.

The test has been published on a brand new domain, so I will be able to gather more genuine results. To stay updated about this test, point your web browser to my PDF ranking and indexing test.

UPDATE: Check out my final conclusion on the SEO PDF Test