Their goal was to “compile the most relevant data management practices for the publication and use of high quality data published by governments around the world as Linked Open Data.” And although this report is directed at “developers, government information management staff, and Web site administrators” it provides good guidance for anyone involved with the care and feeding of information resources on the Web.
In all the GLD outlines 10 best practices:
- Prepare Stakeholders
- Select a Dataset
- Model the Data
- Specify an Appropriate License
- The Role of “Good URIs” for Linked Data
- Standard Vocabularies
- Convert Data to Linked Data
- Provide Machine Access to Data
- Announce to the Public
- Social Contract of a Linked Data Publisher
All important areas to consider. However, I’ll just touch on a couple of the listed practices: guidance on URIs; and the use of standard vocabularies.
They emphasize that organizations should endeavour to provide persistent URIs. This is particularly important as we start to cast our nets out into the linked data ocean. Persistent URIs ensure that web applications continue to work when called on in the future.
Contributors to Slaw have talked about link rot and the importance of PURLs (Persistent URLs) in the past. Simon, for example, has recently reported on the wonderful Perma initiative for legal resources. So it’s also good to learn here that the W3C has also addressed this problem through their Permanent Identifiers for the Web service. This service is a product of the recently established W3C Permanent Identifier Community Group chaired by semantic web advocate Manu Sporny.
One of the other area of particular interest to me is their guidance on the use of standard vocabularies. They emphasize here that: “It is best practice to use or extend an existing vocabulary before creating a new vocabulary.” Encouraging the reuse of existing standard vocabularies provides a chance for better interoperability between data sets.
To assist with the possibility of reuse they point to a number of search tools created to help find structured data represented as linked data (although, despite the recent status of this report, it’s interesting to note that a couple of these tools are currently not available):
- Falcons (oddly not available at time of this post)
- Semantic Web Search Engine (also not available, see their notice)
- Swoogle, and
A quick search for “legal” using Swoogle retrieves about a hundred hits or so. However, it’s not entirely clear what you should learn from these results and I think a future post on strategies for finding useful vocabularies for legal information resources presents itself. :-)
They also provide the following quick vocabulary check list:
- ensure vocabularies you use are published by a trusted group or organization
- ensure vocabularies have permanent URIs
- confirm the versioning policy
For those unable to find a suitable vocabulary and/or need to create their own are offered a good set of guidelines to consult. For one, they suggest using SKOS (Simple Knowledge Organization System) to “represent controlled vocabularies, taxonomies and thesauri.”
If you are setting out down the path toward linked data the W3C has provided a useful collection of best practices that will help guide your efforts.