Citation by Shortened URL
What do the experts do in order to cite long URLs, especially in printed publications where the reader will be seriously challenged to type out into a browser the line-long (or more) address?
Is it acceptable to use services like www.tinyURL.com or www.bit.ly to provide a short, typable URL for one’s sources? The former at least says that the links one creates do not expire (probably as good a promise as one would get for the original site/cite).
The McGill guide says that one could cite an article just by the top level of the domain where it appears, e.g. S. Fodden, “Becoming Slaw”, July 14, 2009, http://www.globeandmail.com. Then the reader would have to search the site to find the article, or perhaps do a web search of the title of the article.
This strikes me as more work for the reader than giving a shortened URL that links directly to the article. Also, some sites still charge for content after a couple of weeks of free access. Sometimes linking directly to the article takes one past the “pay here” page.
One should I think provide the original long URL along with the shortened one, or the top level URL plus the abbreviation. But without the abbreviation, I suspect that most readers will never go to the electronic sources: too frustrating.
Other views? After all, is it really more work to type a long non-intuitive URL than to go to a physical library to get a periodical to find the article? Or do people just expect that following links will be easy, and not compare that effort to going to and picking up a printed source?
Perhaps most people never look at any of the sources cited in an article, though, unless they are testing the author or writing their own article on a related subject. Does that matter to the ease that one should try to create in accessing one’s sources?
I’m no expert, but I’ve used Tiny URL for citations that are going to be in print because the long, non-intuitive URLs are getting worse as more information is served up through databases, replete with session IDs and other information. Length isn’t the only issue, but a long URL in a citation can line-break in places that cause confusion to the rare person who attempts to retype them. (Using bit.ly might actually help, since you could track usage!) I would guess it is even rarer for someone to go to a physical library and pull an article, many of which may be electronically accessible.
A shortened URL would seem to best support easy access to an online resource with a long URL, particularly now that services like Tiny URL support real words – tinyurl.com/8BKyz can now be tinyurl.com/simon-fodden-is-great. My biggest concern is persistency, and it seems to me that using a shortening service is probably no worse than pointing to the top of a Web site where the document may not have been retained, or is so hard to find that it might as well not be there.
Taking into account Ranganathan’s Fourth Law, which is to “save the time of the reader,” I readily “modify” citation rules if I think I provide an improvement and I think providing a tinyurl is a good thing.
I worried about the reliability or permanence of a tinyurl. However, I just checked some tinyurls I sent to someone in July 2007 and they still work . . . .
However, even if the tinyurl works, if the underlying resource (i.e., article) is no longer at the intended URL, you still suffer from linkrot.
One of the things this problem reveals is the technological incompatibility of print and digital media. Long, ugly, database URLs are “possible” because they can be buried in the HTML, leaving only a glowing piece of link text for the reader/viewer. So the long-term “solution” is likely the disappearance of printed books and journals. That doesn’t deal with the persistence problem, of course. And I don’t think there’s much doubt that when it comes to legal and scholarly research, we’ll have to use standardized URIs (uniform resource identifiers) rather than URLs that identify a document by its network location. We could then use a “resolver” to search for a URL for a given URI.
May one safely conclude from these early comments that it is acceptable (possibly even desirable) for a legal publisher to refer to shortened URLs for online sources?
I wonder if the McGill style guide had in mind, by giving only the higher-level domain for online articles, something like what Simon calls a URI – globeandmail.com identifies a collection and indicates that the article is in there somewhere? Or is this a substitute for naming the publisher? Presumably the name of author, title of article and date of article would also be part of this ‘identifier’ information.
One could get away with that for online sources only because of the power of search engines to locate the actual text. It would not be acceptable on paper to cite only to the Canadian Bar Review, without giving a volume and page number, even with the author’s name and the title of the article.
Well, or we could all work to improve URLs. I am concerned whether tinyURL, bit.ly et al will persist. tinyURL was having problems a while back with links breaking, and URLtea which some were using just disappeared, losing many of those links.
But yes, even direct links change over time. It is challenging.
I’d rather have a direct link, and be able to pull a missing page out of the Internet Archive or Google cache.
I understand why these sites get used for print publishing, or for Twitter, etc. But I’m not a fan of the ‘relay’ website concept in general. They are slowly killing the concept of online citation and link popularity; one of the better foundations we have for measuring quality within the search engines.
I am with Steve on this one. A direct original link is measureable for a range point in time from the Internet Archive. Using a shrotening service like bit.ly or TinyURL is great for tweets, but inappropriate for more formal writing.
Very interesting post John. Thanks for making us think about this.
It seems to me that when one is citing an online source with a long URL in a printed article, there is considerable justification (including from some of the comments here) for giving both the original long version and a shortened version. Then people can try to trace the original through the Internet archive if they wish, or via a search engine, but for immediate practical purposes they have a typable alternative.
For example, I have written an article for print publication with footnotes to sources like this:
http://www.abanet.org/abastore/index.cfm?section=main&fm=Product.AddToCart&pid=5450052
and this:
https://blogs.secondlife.com/community/features/blog/2007/07/26/wagering-in-second-life-new-policy
and even these, from Slaw:
http://www.slaw.ca/2009/04/25/lawyer-twitter-practices-29-do%e2%80%99s-and-don%e2%80%99ts/
and
http://www.slaw.ca/2008/12/06/meat-on-the-bone%C2%A0-comments-on-the-guidelines-for-practicing-ethically-with-new-information-technologies/#comment-702028.
How about an EU legal link: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2000:178:0001:0016:EN:PDF
I did not hyperlink these because I want to give the impression that these will have on the printed page: in a word, a bit intimidating.
I doubt that any reader would want to type all that out to check what the source said, and a reader who tried might make frustrating typos and have a hard time knowing if the URL was given wrong or typed wrong or the link had rotted.
Could Steve explain “the concept of online citation” that would be killed (albeit slowly) by relay websites? I understand that relay sites could make it hard to know how many people had accessed an article, by diluting the count. But what is the concept here?
The long cites just given by John (and automatically turned into hotlinks — sorry about that, John) would not only frighten readers (I mean, imagine typing some of them out without losing interest along about “8:0001:0016”) but would daunt a publisher’s cite checkers — unless, of course, they used the electronic copy provided by the author.
Steve’s concern with whether Google etc. can disentangle the short URL to discover the real URL behind it and thus credit the document with having been linked to (at least I think that’s Steve’s concern) has to do with online publications of course and wouldn’t prevent authors headed for print from getting and using a short URL. I suppose the online copy of the article/book would have to use only the long URLs and purge the short URLs then, but that shouldn’t be too hard to arrange.
Giving credit is a big part of it, as Simon’s said. Google can obviously crawl through these services to find the sites (which can go down, or die.. another problem), but there’s a trickle down effect on Google PageRank (or other search engines that use link popularity as a quality measure). Only a portion of the link’s value will be passed, and these services can also choose to a apply the ‘no-follow’ attribute to the relayed links. Which essentially makes them a black hole, stopping the credit given entirely.
What I haven’t seen referenced in this discussion is the question of why so many URLs are so freakin’ long and complicated in the first place. I’m no tech expert, and there may be a perfectly good reason why a mainstream URL is 95 characters long and includes hyphens, ampersands and other relatively obscure symbols. But it seems to me that some sites manage to use sharp, concise addresses for their content, while others’ web page addresses go on forever. Maybe if websites did a better job with reducing their URL carbon-footprints, we wouldn’t need so many TinyURL-type shorteners?
A few notes: When you use a tinyurl.com URL, tinyurl applies a redirect from that address to the permanent/long URL. As far as google is concerned it is the same as typing that long URL. All of the pagerank passes to the destination page.
Google will only cache such a page by the long URL, not by the tinyurl address.
Tinyurl has complete control over that redirect. If that redirect gets removed for some reason the link will cease to work. The tinyurl is not an encoded form of the long URL that will always lead to the same place if something happens on the company’s end.
My personal opinion is that such things are not appropriate for citing in formal research. Most people just cut and paste anyway.
John, I think the question is what is convenient for people who are reading the text (with the citation) in paper format–are they going to have to type that long URL in? Especially if they want to look up several of the sources, it is a pain.
Jordan, that was my point, too. Website owners need to address the length of URLs. Usually it is on the content management system’s end.
Some websites add in (for lack of a better term) markers to the URL to track your path through their website. Amazon is notable for this, but I see this on a lot of sites. When cutting and pasting a URL for use in a citation, I try dropping off anything with after question mark in the URL (including the question mark) to see if the shortened URL still takes you to the page in question. Usually it does.
For example, if I search Amazon.ca for Ted Tjaden’s book using his name, and I click through to the book’s page, the URL is:
http://www.amazon.ca/Legal-Research-Writing-Ted-Tjaden/dp/1552210987/ref=sr_1_1?ie=UTF8&s=books&qid=1248017135&sr=8-1
If I take off everything from the question mark back, it still works:
http://www.amazon.ca/Legal-Research-Writing-Ted-Tjaden/dp/1552210987/ref=sr_1_1
I look at this and wonder if I can’t remove more, if just up to the part with title and author name is sufficient. I wonder if I can shorten it more, so then I start removing anything after the last slash (/). This still works:
http://www.amazon.ca/Legal-Research-Writing-Ted-Tjaden/dp/1552210987/
although these two don’t:
http://www.amazon.ca/Legal-Research-Writing-Ted-Tjaden/dp/
http://www.amazon.ca/Legal-Research-Writing-Ted-Tjaden/
But, I’ve managed to cut the URL almost in half for the purposes of citation.
Here is what a footnote with shortened URLS might look like, otherwise designed to comply with the McGill style guide for such matters. It is taken from an article on electronic legal matters to appear in the Annual Review of Civil Litigation for 2009.
[footnote] A woman obtained a divorce in England against her husband because of his virtual relationship with another woman. S. de Bruxelles, “Second Life affair leads to real-life divorce for David Pollard, aka Dave Barmy” Times Online (November 20, 2008) online: <http://women.timesonline.co.uk> see <http://tinyurl.com/5774xs>. A woman in Japan was prosecuted for “killing” her husband’s avatar. K. Parrish, “Japanese Woman Kills Online Husband” Tom’s Guide (October 27, 2008) online: <http://www.tomsguide.com>, see <http://tinyurl.com/686m9q>. In China, someone killed another player of an online game because the victim had stolen or destroyed his online property. “Gamer gets life for murder over virtual sword” Reuters (June 9, 2005) online: <http://news.cnet.co.uk/> see <http://tinyurl.com/qzyso>.
If the article (and publication generally) goes online, one would ideally have the full original URL for each reference. Either that link would be live, or the reader could copy it and paste it to his or her browser, as John Nymat pointed out. The purpose of giving a tiny URL is to permit the reader of the print version to have a more accessible reference.
John Nymat is correct, that most of these services apply 301 permanent redirects, which currently relays link credit with Google. What I question is whether this will always be the case, and the necessity of inserting a variable into the credit equation.
The visual citation of main site and tinyurl direct link from Johns comment #14 looks practical and realistic. I think that this method would satisfy both the source credit and practical issues for providing links to internet available material.
The archival issues of the disappearance of the URL shortening service and link rot are not solved by this method though. Only having the full direct URL would possibly make the material retrievable from the Internet Archive.
I wonder if any of these issues will be discussed, along with fair dealing, during the copyright consultations currently under way.
How about using a version of scannable codes in order to encode long URLs for print media? You may have seen these — the National Post uses them, and I posted about them a while back. In current use these QR Codes are a square of light and dark areas: see the graphic here:
The currently neat thing is that they nicely bridge the gap between print and digital, because you can take a photo of one with your mobile phone and a small application will convert the graphic data into other forms, typically a URL. It ought to be possible either to use a very small square or an elongated code for published legal material. No need to copy out the URL and risk missing a digit or two.
Fun idea, Simon, but will one of these QR blocks fit into the couple of lines of a footnote in a law review article, in a form that is still scannable?
I think we are going to go with what I set out in comment 14, above, despite Shaunna’s concern about tracking something into the Internet archive. The trouble with putting the full URL into the footnotes is that it can take up a lot of space – a line or two in some cases – which makes the footnote almost impenetrable to read, or to spot the tiny URL in the middle there somewhere, especially if there is more than one URL per note.
In my experience with rotten links, one can usually find another version of an article by entering the author’s name and the title into a regular search engine. So if tinyURL.com goes out of business, there is still hope…
Does a search engine search turn up Internet archive articles, or does the archive block search engines, so you have to go to the archive and query it directly with a full link/URL? (Maybe that’s a question for a different thread…)