On WestlawNext, State of the Art & Steve Jobs: A Conversation With Peter Jackson, Chief Scientist for Thomson Reuters

For about a year I’ve wanted to talk to Peter Jackson*, the Chief Scientist and Vice President of Technology at Thomson Reuters. It started around this time last year when I stumbled across a brief video interview of him discussing Reuters Insider. Although Jackson’s comments (advancements in turning video into text) had little to do with legal publishing, I was intrigued by any possible extensions into the legal space, and I also wanted to know more about what exactly a chief scientist for one of the world’s largest publishing companies actually does. So I connected with him on LinkedIn, subscribed to his personal blog, and kept tabs on whatever work his group was doing that might pop up on the Interwebs.

When Project Cobalt broke, and I was lucky enough to receive an invitation to preview WestlawNext this past January, I thought I might have an opportunity to visit with Jackson about the project. After all, it represented a significant advancement in legal search, and who better to talk to about the brains behind the system? Unfortunately, that didn’t happen, and, like so many things, those questions were deferred. 

Until last week.

When you talk to an architect about a project, that conversation is going to be very different from the one you have with a person selling it. And so it was with Jackson. While most of the questions regarding WestlawNext and the search engine behind it, WestSearch, have been addressed by the online media, listening to Jackson talk about the project and some of the history behind it left me with a very different impression of the project’s importance to both Thomson Reuters and the profession as a whole. In the end, I look at it in two ways. First, WestSearch is to terms and connectors, what terms and connectors is to whole word searching in Acrobat. It’s that much more sophisticated. Second, it brings together both ontological and epistemological interpretations of the law and breathes new life into an old aphorism: the law is a seamless web. 

I debated for several days about how to write this piece, opting to reprint (rather than summarize) much of our conversation so you can get a sense of both the why and how of WestSearch in Jackson’s own words. We also had some time to touch on a couple of other subjects, namely state of the art in legal search and Steve Jobs, the answers to which I found enlightening, and uplifting. I hope you do as well.

On WestSearch

JW: In my review of WestlawNext, I described WestSearch as “intuitive global search,” which, as I understood it at the time, meant that it combined federated search and a new algorithm for determining relevancy. Is that too simplistic of a view? I suppose, in a way, I don’t really have a good sense of what is going on behind the scenes with WestlawNext.

PJ: You know, in a sense, I think that’s normal. We use high-tech appliances all the time that we don’t really understand. And one of the things I believe is that search should be an appliance, you know it should be something that you use without having to necessarily understand all of the ins and outs of how it works. And I actually made a distinction once in a paper between a tool and an appliance. A tool being something that requires a serious amount skill to use, requires some training and certification, and maybe even a certain amount of caution on somebody’s part. Whereas an appliance you just assume that none of those things are required. You don’t have to be trained to use a hairdryer for example. And I think that is the direction that the power tools of our industry are going, in a sense they are almost ceasing to be tools and are becoming appliances. 

JW: Much like the iPhone and the iPad.

PJ: Absolutely, the iPad is the perfect example of that because there is no manual, yet a three-year old can pick it up, poke around on it, and make things happen. And I think that is what is so magical about it, to use Steve Jobs’ word. With WestSearch, we wanted to come up with a better search experience for our users, who are primarily legal researchers. We really felt the time was right to take legal search to another level, and I think one of the things about being in business is that you don’t want to be behind the market, yet at the same time you don’t necessarily want to be massively ahead of the market in terms of giving people whiz-bang technology that they’re not really ready for. I think in the case of legal research, a large portion of the market was ready for something new and was willing to try something different.

JW: Mike Dahn (pdf link), Vice President of New Product Development, described it as “ending the tyranny of the keyword.”

PJ: Yes, I think that is entirely accurate. I think that legal search, as it has been practiced on online databases, has really required the user to play a kind of guessing game that says “Oh, I’m interested in such and such a topic, what words would bring documents back having that topic?” You know, in some ways it is like a game. And I think like any game it is a hit and miss process and the longer you play the game, the better you get at it. But the game leaves quite a lot to chance. So that was the state of play prior to February 10th, when WestlawNext was launched. I think the state of play now is different. I think we really baked two important new sources of knowledge into WestSearch. One was taking our editorial value addition and really all the metadata we generate around legal documents, whether by human editors or whether by some computational process. We took that metadata and baked it into the search algorithm itself. Previously, all that metadata was there, but it was really there for navigational purposes. It was there for the searcher to consume but it wasn’t actually informing the search engine.

JW: So all of the tagged data on Westlaw.com, specifically before WestSearch, was principally used for navigating through documents on the screen? That’s it?

PJ: Yes. So that was one important knowledge source that wasn’t available to the search engine. The second one was the aggregate user behavior, which obviously includes click-throughs, and includes more than that because we offer the user a pretty rich experience on Westlaw. They can print documents, mail them to themselves, and they can run KeyCite over them. There’s a lot of different things they can do that indicate they are interested in a document. And of course, we have for many years, collected queries and saved them in query logs. We’ve done that in the past primarily for quality control purposes and for helping people who call in with search problems, but we’ve never before used that query log for anything computational. Whereas now, given the aggregated user behavior, we’re able to get a pretty good grasp of what kinds of things users are searching for and what kinds of things users find and regard as valuable in that search context.

{Note: This user behavior is also reflected in the phrase “meaningful interactions.”} 

JW: Does WestlawNext’s user interface end up creating more avenues for discovering different types of user-generated metadata? For example, WestlawNext now allows the user to put documents into folders, to share those documents, to tag documents as “looked at,” copy and paste with a reference, and so forth.

PJ: We haven’t deliberately built those kinds of mechanisms into the system in the sense that we don’t make people jump through hoops just so we can collect data about them. I know there is a school of thought around that, that you should design your interface in such a way that you collect the maximum amount of data from your users by making them click on this or click on that. We didn’t deliberately do that. The functionalities in the WestlawNext UI are there because they are convenient to the user.

On ResultsPlus

JW: I think many readers would like to know whether ResultsPlus is gone?

PJ: No. ResultsPlus is still there on Westlaw.com. And of course, you can look at ResultsPlus as being a bit of WestlawNext that was sort of smuggled into Westlaw.com in the sense that ResultsPlus was a glimpse into the future of what we could do when we started using these new technologies. But there are certainly now ResultsPlus-style processes that are now first-class members of WestlawNext, meaning sort of the best of ResultsPlus is now baked into to WestlawNext, along with other stuff.

JW: In WestlawNext, there are three sort of primary visual collections. On the left-hand side, I have a table of contents that includes the universe of databases I’m searching across. In the middle are the principal results. And on the right-hand side, I have what appear to be related sources. Now, if I want to dig deeper, supposedly, I can look at these right-hand sources. Is that information different than what is collected on the left-hand side, or is it just the same information presented in a different way?

PJ: It’s a little different in the sense that what we wanted to do was provide people with a means to still discover things on their own. Search is good and we definitely want to improve the search experience, but at the same time we wanted people to have the liberty to wander off the beaten path and create their own browsing experience and discover things on their own. And so, typically when you’re looking at a document and things are on the right-hand side, these tend to be things that we consider to be related documents that you may or may not want to explore. But we always like to present people with a penumbra of related documents depending on what they are currently looking at. And then people can choose to explore those additional paths or not. It is more to encourage browsing.

On Relevancy

JW: How do you define, in the context of WestSearch, relevancy? Is it just that at some point relevancy is determined by the fewest number of iterations to arrive at an answer, as determined by the user of course?

PJ: Not really. We try to do two things at the same time. There is relevancy and then there is importance. I think relevancy is really a textual thing. It’s really sort of a computation that says out of all the documents that are invoked by a combination of this language and all the metadata we have associated with the documents in our store, these documents appear to be the ones about the same legal issue. And so, early in the Westlaw search process we do actually try and identify the legal issue or issues behind the query. And this is our first kind of step away from keyword approach. So relevancy is now relevant not to the query but to the issues that we think are associated with that query. And once you’ve gotten away from those keywords, you can have a much richer notion of relevancy.

JW: Right, because relevancy up to that point was determined based on the frequency of the terms, co-occurrence, proximity, and so forth. I wonder, though, whether the vast majority of your users place a greater emphasis on relevancy than say, recall?

PJ: I don’t think that’s true. I think people do want high recall and want to go into court feeling that they’ve left no stone unturned. And I think that is one of the huge differences between legal search and say, for example, searching on the web. People typically do want to feel that they’ve found everything that is relevant and important. I think the way we thought about this when we were designing the system is that, we get higher recall by leveraging all of our metadata. And so you don’t have to say the magic words in the right combination. Using metadata, we will find the relevant documents whether they contain the magic words or not. And in doing that you are throwing the doors open very wide. You’re creating now a much broader search than you would previously when you were just restricted to keyword occurrences. So in a sense, to counter that, or at least to make sure that the user still has a good experience, we use the click through data to refine and cull that much larger search result, so that we are only bringing back what we think are the important documents. The documents that are important to cite and documents that other people have found useful. In a sense, the metadata gets you the recall and the user data gets you the precision.

JW: And that user data, something that you have referred to as the “daily implicit feedback mechanism,” you’re folding that back in week after week, or is that something that is being done on the fly all the time?

PJ: I think the important thing to understand at this early stage is that user data was derived from Westlaw.com, and not WestlawNext. Obviously WestlawNext was new, so there was no user data. In the literature, they call this the cold-start problem. It’s like, if you’re Amazon, and you want to make recommendations to people, of course, someone has to buy something before you can recommend anything to someone else. And in fact, you have to have quite a lot of that data before those recommendations start to make sense. So we have a bit of a cold start problem, and we had to use click-through data from Westlaw.com, which is fine because it is the same document collections and people are searching for the same kinds of things. Once WestlawNext has been running for a few months, we will switch that over and now our user data will be derived from WestlawNext itself.

JW: Do you anticipate getting more accurate results.

PJ: We think it will. We think people will be finding more of the good stuff faster.

JW: And I guess with the federated search too, there is the opportunity of finding everything.

PJ: Yes.

JW: Which to me was one of the more significant advancements as far as Westlaw was concerned, was just being able to show you everything.

PJ: That’s right. So you won’t just be now reviewing and clicking on results from either the specific database you are in or ResultsPlus. You’ll be really working with a much larger result set.

JW: The problem of knowing where to find something in a database was significant for me as a researcher. I would assume the same holds true for most lawyers, who typically aren’t power searchers.

PJ: This was a matter of some concern to us, and we had been thinking about that problem for some time, and the answer wasn’t simply to make it a global search and throw the doors open to every database because I think if we had done that simply using keyword searching, I think we would have made the search experience worse, not better.

On Analytical Content

JW: I am curious about analytical content. Do you view analytical content more as metadata for primary law, or do they serve as metadata for each other? I ask because I create analytical content, and so I have a particular fondness for using analytical content first to find an answer. It is frustrating to hear people talk about analytical content, not as the gateway to primary law, but as the stuff underneath that helps you bring all that primary law together.

PJ: That’s an interesting question. We built ResultsPlus because we felt that people were neglecting analytical sources. I think that the provision of case law searching in the 80s, and perhaps earlier, really changed the methodology of legal research to some extent. It certainly made fishing expeditions into case law a lot easier than you could have ever done it by following a paper trail. I think that ResultsPlus was meant to encourage the use of analytical resources as an entry point into an area, particularly an area that you might not be familiar with. So I think we were promoting analytical material in ResultsPlus as being, not a second class citizen, but something that was worthwhile in its own right. I think in WestlawNext that we’ve sort of continued that philosophy, so that regardless of what you are searching for, if there is analytical material that’s on that topic of issue that we think is relevant, we do present that.

On Snippets

JW: How much do snippets affect WestSearch? I’ve got to imagine that no matter how good an algorithm is, or no matter how good a process is—because WestSearch is more of a process of both people and machines—but if that information doesn’t provide enough information on the screen when I’m scrolling through an answer, that sort of meaningful interaction, that click through, is sort of lost at that point. Did you guys give a lot of consideration to the level of information that you provide on the screen, or did you have a good sense from user experience already that we only need to show, say 100 terms on the page.

PJ: We gave a lot of thought to the snippets. We have a couple of people in my group who are experts in automatic summarization, and they had already sort of cut their teeth on ResultsPlus. If you think about Results Plus it was a document recommendation system, and in some of those instances, say when we were recommending a brief, it doesn’t help the user to just show the title because the title doesn’t tell you anything about what’s in the brief. So, it was at that point when we loaded briefs to ResultsPlus that we realized that we had to come up with snippets that would allow the user to decide there and then whether or not the document was relevant. And the way we did that was to make the snippet that was generated sensitive to the query. So instead of just storing a snippet with the document, we would generate the snippet dynamically based upon the query. And we took that same philosophy and that same technology into WestlawNext. That sort of query sensitive summarization is really what the snippets are about. And we think they are very important.

JW: They are important, particularly when you take the next step and eliminate sort of the irrelevant information from the snippet, say a full case citation because there is only so much space, real estate on the screen, when you present that. And particularly for ResultsPlus when you are having to put something in such a narrow text.

PJ: That’s a good point. ResultsPlus taught us the value screen real estate in a way that perhaps nothing had done before. It’s very interesting, when you give people recommendations, and you get something like five recommendations, they only really look at the top two. And so, the ranking on the recommendations has got to be really, really good. And I think that was another learning experience for us, and a learning we carried over into WestlawNext, that screen real estate is very precious and the ranking is just extremely important.

On State of the Art

JW: I was curious about the current state of the art in legal search. In 2007, you raised the issue that extraction technology required information to be explicitly stated in the text; it couldn’t be implied. You used an example of when a debtor moves to convert from Chapter 7 to Chapter 13, and a creditor files a complaint to oppose it, the judge decides the case by “finding for the plaintiff,” which really means the conversion was denied because the plaintiff is the creditor and the defendant is the debtor. Are we any closer to achieving the inferential capability necessary to extract this kind of data? 

PJ: I think it is a very hard problem. I think that in theory you could sit down and build a very specially crafted solution for that particular kind of inference. It’s very hard to see how you do that in a way that would be scalable or would apply to similar kinds of reasoning problems. With the right about of duct tape you can solve all of these narrow problems. You can always come up with some sort of algorithm or device or whatever, but to come up with a more general solution that you could apply to different kinds of situations, even just within a case, is much more difficult. For example, when we worked on Litigation Monitor, we wrote a program that went through the front matter in a case and figured out who the plaintiffs were, who the defendant’s were, what attorney and law firms were represented, whom they were representing, and what was the case was about. But that was a very specially crafted piece of code. There’s nothing in that code that would help you solve an analogous inference problem either with a different kind of document with a different kind of format or some other kind of reasoning problem like the one you described about bankruptcy.

JW: At some point the bankruptcy problem is probably solved more by the web of metadata that might go to inform that type of opinion I suppose. 

PJ: Either that or you’ve got to have a script that’s almost like a little movie script that says here’s how bankruptcy hearings normally go. The creditor is trying to pay as little as possible so they are going to try and gravitate towards this end of the spectrum, and the debtors are going to want to push things in the opposite direction, and these are the kinds of motions that get filed, and when these motions are disposed of in one way or another that means that this person has won, and this person has lost, and so on. There almost has to be a movie script that says how these kinds of cases play out. And imagine doing that for all the different kinds of cases. It wouldn’t be very pretty. To me, what this speaks to is the continuing value of our editorial resources because one of the things I’ve argued for in many of the papers written is getting the right allocation of function between person and machine that makes for a great person-machine system. I think there are many people out there that want to automate everything and then on the other hand there are those out there that say a human has to touch everything, and I think they are both wrong. We are at that stage now where you can build a very effective person-machine system to do a lot of very useful information tasks.

{Note: Jackson’s opinion on value of the person-machine combination reminded me of a similar observation that Garry Kasparov made in a recent piece in the New York Times Review of Books on “freestyle” chess play.}

On Steve Jobs

JW: I know you are a fan of Steve Jobs, and rather unapologetic about it if I recall your last post about D8. What is Jobs doing now, in your opinion, if anything, that is having an effect on legal practice? Or legal research perhaps.

PJ: I don’t think he is doing anything that directly affects legal research. But I think this business of somehow creating devices that are incredibly convenient and very easy to use, and have a long battery life and are a pleasure to handle and to work with — I just think it sets the bar pretty high for everybody else. In the past, I think we just assumed that using a computer was really a kind of drudgery and required you to sit at a desk and to work on your carpel tunnel syndrome. And I think that things like the iPad show that’s not the case, the fact that usability is now finally becoming a reality. In a piece I just wrote, I asked “what was it at D8 that really grabbed me, and was there any sort of new technological development?” And I think the answer was “no.” But I think what was innovative for me at D8 was the fact that there were so many products showcased where the human was really in the center of the story. Whether it was James Cameron talking about Avatar and the fact that you still need actors and you still need to connect with the emotions of your audience. Whether it was Microsoft showing off its new game controller which doesn’t require any gadgets, it just looks at your limbs and how they are disposed and interprets those things as signals, and you’re not even wearing a device of any kind. Or whether it’s the iPad itself, that can be used by a three-year old. What struck me very forcibly was this business about putting the human back center stage. This is something that is finally happening. Academics have been writing about this since the 80s; user-centered system design was a concept that came out of the early 80s, and it’s taken 25 years to get to a point where we can at least argue that we’ve made some progress. It’s really quite amazing.

JW: This kind of conceptual thinking, do you think it will compel users to look at their computers differently, sort of like “why can’t I do this on my computer?” Or mentally, are they just going to continue to separate it out?

PJ: No, I think it raises the bar on everybody. And this is why we brought out WestlawNext for the iPad, we just felt like we had to do it. I think if you look at the history of West, and I like to think of it as a layer cake, there are sort of strata of innovation. We started with data, we built KeyCite on top of that, we built document categorization on top of that, we built document recommendation on top of that, and we used all of that to revolutionize search. I think these kinds of investments are those kinds of investments we have to make and I think everyone in our business understands that. I get very excited when somebody like Steve Jobs, who is out in the consumer space, goes and does something wonderful because it raises the bar for us, it inspires us to do more. I think it is an endless task because you can always make things better. But we’re sort of up for that task, and I think that’s what my group is all about. That’s why we have an R&D group. I love it when you see these kinds of advances in the outside world and they present us with a challenge. I think it is very exciting, and in a sense, our industry is going through a very exciting time. I’ve been in the computing business for 30 years, and I don’t think I’ve ever seen the pace of change we are seeing now. Everybody says that, but it is really true. And it’s not just in processors and bandwidth and all the numbers people like to cite. It’s really around creativity and imagination — people are bringing to new forms of media, entertainment, information, social constructs. It’s really pretty amazing.


* According to a recent article, Jackson is an expert in “information retrieval (search), document categorization (automated indexing of content), machine learning (the design of algorithms that enable software to learn from and make decisions based on data patterns), and natural language processing (in which software can summarize content, convert computer language into human language and vice versa, or make a computer speak with human tones).” [back]

Comments

  1. Congratulations on this scoop, Jason.

    Peter was interesting but necessarily much briefer in his comments in Eagan, back in the snow season.

    But there are real insights into the strategy here.

  2. Simon,

    Jackson is a tremendously nice guy, and you’re right, there are a lot of insights into where TR is steering legal search and the fact that we haven’t really tapped into the databases yet.