W3C Best Practices for Publishing Linked Data

Earlier this month the Government Linked Data Working Group (GLD) at the W3C relaeased their Best Practices for Publishing Linked Data.

Their goal was to “compile the most relevant data management practices for the publication and use of high quality data published by governments around the world as Linked Open Data.” And although this report is directed at “developers, government information management staff, and Web site administrators” it provides good guidance for anyone involved with the care and feeding of information resources on the Web.

In all the GLD outlines 10 best practices:

  1. Prepare Stakeholders
  2. Select a Dataset
  3. Model the Data
  4. Specify an Appropriate License
  5. The Role of “Good URIs” for Linked Data
  6. Standard Vocabularies
  7. Convert Data to Linked Data
  8. Provide Machine Access to Data
  9. Announce to the Public
  10. Social Contract of a Linked Data Publisher

All important areas to consider. However, I’ll just touch on a couple of the listed practices: guidance on URIs; and the use of standard vocabularies.

They emphasize that organizations should endeavour to provide persistent URIs. This is particularly important as we start to cast our nets out into the linked data ocean. Persistent URIs ensure that web applications continue to work when called on in the future.

Contributors to Slaw have talked about link rot and the importance of PURLs (Persistent URLs) in the past. Simon, for example, has recently reported on the wonderful Perma initiative for legal resources. So it’s also good to learn here that the W3C has also addressed this problem through their Permanent Identifiers for the Web service. This service is a product of the recently established W3C Permanent Identifier Community Group chaired by semantic web advocate Manu Sporny.

One of the other area of particular interest to me is their guidance on the use of standard vocabularies. They emphasize here that: “It is best practice to use or extend an existing vocabulary before creating a new vocabulary.” Encouraging the reuse of existing standard vocabularies provides a chance for better interoperability between data sets.

To assist with the possibility of reuse they point to a number of search tools created to help find structured data represented as linked data (although, despite the recent status of this report, it’s interesting to note that a couple of these tools are currently not available):

A quick search for “legal” using Swoogle retrieves about a hundred hits or so. However, it’s not entirely clear what you should learn from these results and I think a future post on strategies for finding useful vocabularies for legal information resources presents itself. :-)

They also provide the following quick vocabulary check list:

  • ensure vocabularies you use are published by a trusted group or organization
  • ensure vocabularies have permanent URIs
  • confirm the versioning policy

For those unable to find a suitable vocabulary and/or need to create their own are offered a good set of guidelines to consult. For one, they suggest using SKOS (Simple Knowledge Organization System) to “represent controlled vocabularies, taxonomies and thesauri.”

If you are setting out down the path toward linked data the W3C has provided a useful collection of best practices that will help guide your efforts.


  1. Tim:

    There’s a much longer discussion to be had here, certainly. As a first step, I think I’d caution against immediate acceptance of vocabularies that “look good” (even though you will probably end up very happily using some of them). Our practice here is to try to model very close to the data, and then — once we’ve been through that exercise, and understand a good deal more about what we’re trying to model — look around to see what we can make use of. I’m not pushing the idea of “data exceptionalism” — that’s a problem in its own right, and law people are very prone to it. But we have found that there are problems with many vocabularies that you’d think would be usable in a straightforward way. For example, off-the-rack FOAF has no temporal dimension to the idea of membership in a group, and schema.org thinks lawyers are locations (because it was done from the perspective of local-business advertising). So I’d agree with the recommendation to reuse — but stress that when you do, you open a big can of Caveat Emptor.

    That said, we’ve found FOAF, Dublin Core, the W3C organizations ontology, BIBO, and the event ontology for digital music created by the University of London, among others, to be very useful, along with the work done by legislation.gov.uk, the Metalex crew, and others (the digital music ontology provides a nice approach to events that is useful in talking about legislative process).

    FWIW, you can find our work on American legislation at http://blog.law.cornell.edu/metasausage/downloads-and-related-information/ . Some of the topical papers included in the documentation may provide a little insight as to how we’ve been thinking.

    We’re currently at work on models for CFR, the US Code, and published legal scholarship. Judicial opinions are not far behind, but present greater challenges for identifier design.

  2. Hi Tom,
    Thanks for this. I appreciate your comments on the potential pitfalls of vocabulary reuse and the need to know and model your own data before looking for existing vocabularies and metadata schema. I guess it boils down to finding the right balance between using work that’s already been done and doing new work from scratch yourself. Your FOAF example is a good one, but I think you’d agree that there is much in FOAF that serves as a basis for description of “persons” and the relationships between them. And when used in combination with other vocabularies can achieve the desired results.

    I have read the work that you, Sara Frug, Diane Hillmann, John Joergensen, and Jon Phipps did on modelling legislative data (and would recommend it to anyone interested in research in this area) and in that is discussed the importance of developing an application profile like the Dublin Core Metadata Initiative’s Singapore Framework which is referenced as a “roadmap” for work in this area.

    The conclusion there was:

    “The Singapore Framework, with its emphasis on use cases and documentation, provides a sound methodology for the development of a legislative data model. Use cases are easy to gather, and only slightly more difficult to collate and organize. Above all, use of use cases avoids the customary pitfall of making the perfect the enemy of the good. Keeping the design very close to expressed user needs avoids analysis paralysis — the striving for a model that can never be sufficiently perfected.” (p. 197)

    Would you agree that developing interoperability through application profiles would be a useful place to focus our energies? Can you point to other recent work on application profiles for legal resources?

    Thanks again for sharing your thoughts.