Google to stop supporting data-vocabulary.org on 6 April 2020

Google announced today that on As of April 6, 2020, “data-vocabulary.org markup will no longer be eligible for Google rich result features.”

Starting immediately, Google “Search Console will issue warnings for pages using the data-vocabulary.org schema so that you can prepare for the sunset in time.”

A brief history of data-vocabulary.org

The domain data-vocabulary.org was registered by Google on 9 May 2009, just three days before Google announced, for the first time, the availability of rich snippets in the search engine’s results.

The markup formats Google supported at this time were microformats and RDFa. The RDFa implementation documented used the subdomain rdf.data-vocabulary.org. At that time this was the sample markup provided by Google for a review.

<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:review">
   <p><strong><span property="v:itemreviewed">Blast 'Em Up</span> 
Review</strong></p>
   <p>by <span property="v:reviewer">Bob Smith</span></p>
   <p><span property="v:dtreviewed">March 20, 2009</span><p>
   <p><span property="v:description">This is a great game. I 
enjoyed it from the opening battle to the final showdown 
with the evil aliens.</span></p>
   <p><span property="v:rating">4.5</span> out of 5 stars</p>
</div>

In the rich snippet announcement post the authors (among them Ramanathan V. Guha, who was to play a major role in the development of schema.org) had this to say about vocabulary:

We do believe that it is important to have a common vocabulary: the language of object types, object properties, and property types that enable structured data to be understood by different applications. We debated how to address this vocabulary problem, and concluded that we needed to make an investment. Google will, working together with others, host a vocabulary that various Google services and other websites can use. We are starting with a small list, which we hope to extend over time.

The “vocabulary” referred to here presumably refers to data-vocabulary.org, and it did indeed evolve from a “small list” over time to become more expressive.

data-vocabulary.org truly came into its own in March 2010, when Google announced support for microdata – a structured data markup standard well-suited to using data-vocabulary.org (and which the vocabulary may in part have been designed to work with). While Google continued to support rdf.data-vocabulary.org, microdata allowed supported types to be declared directly with data-vocabulary.org URIs, as per this support page example captured in March 2010.

  <div itemscope itemtype="http://data-vocabulary.org/Review-aggregate">
    <span itemprop="itemreviewed">L’Amourita Pizza</span>
    <span itemprop="rating" itemscope itemtype="http://data-vocabulary.org/Rating">
      <span itemprop="average">9</span>
      out of <span itemprop="best">10</span>
    </span>
    based on <span itemprop="votes">24</span> ratings.
    <span itemprop="count">5</span> user reviews.
  </div>

data-vocabulary.org was also used as the go-to example for microdata in Mark Pilgrim’s classic work Dive Into HTML5.

The death knell for data-vocabulary.org sounded on 2 June 2011, when Google, Bing and Yahoo jointly announced the availability of schema.org.

Over the course of time, as more schema.org types and properties were developed, Google retired its recommendations for employing data-vocabulary.org in favor of schema.org equivalents as they became available.

Of the previously-employed data-vocabulary.org types it was Breadcrumb that persisted the longest, due to initial implementation problems with schema.org’s BreadcrumbList. It was not until June 2015 that Google lent its official support to BreadcrumbList.

Likely impact on publishers

While schema.org has largely supplanted its data-vocabulary.org equivalents in actively-maintained markup, there’s still millions of URLs at which the vocabulary can be found encoded (much of this due to unrevised templates that have incorporated data-vocabulary.org markup).

Google’s relatively late embrace of schema.org/BreadcrumbList, as noted above, means that this is the most common implementation of data-vocabulary.org seen today. Here, from the Web Data Commons structured data extraction of the November 2019 release of the Common Crawl, are the top ten data-vocabulary.org classes in the microdata extraction by domain count (rank is the overall rank of the class in the microdata extraction, which is dominated by schema.org).

Top data-vocabulary.org classes by domain count from the Web Data Commons structured data extraction of the November 2019 release of the Common Crawl

And here are the top ten data-vocabulary.org classes by URL count.

Top data-vocabulary.org classes by URL count from the Web Data Commons structured data extraction of the November 2019 release of the Common Crawl

Again, breadcrumb markup looms large in the list of data-vocabulary.org that publishers may need to update, as do reviews and ratings (reviews and ratings being reflective of Google’s incremental support of rich snippets, which started with people and reviews).

But ultimately there are likely to be relatively few publishers impacted by the sunsetting of data-vocabulary.org. Even when it comes to breadcrumbs, while something like 800,000 domains is obviously well, a lot, this is of a sample of more than 32,000,000 domains.