Data are recorded about much that we do these days. We all leave a digital trail. The resulting data are a rich source of insight, but in their raw form, they don’t tell us much. We need to analyze data properly and methodically to make sense of it.
The recent poor performance of opinion polls in both the UK’s referendum on remaining in the European Union (“Brexit”) and the US Presidential election left me wondering what they tell us about our dependence on data analytics? Sometimes the models, and the assumptions underpinning them, need to be questioned.
In the case of the Brexit referendum, the polls predicted that the result would be close but that the remain camp would prevail. The polls were wrong (again) and the UK voted to leave the European Union by a clear margin. If only the pollsters had picked up the zeitgeist, they may have amended their models. That said, the polls were not wildly wrong like the previous UK general election —the referendum polls showed the gap was closing —but it’s clear that the possibility of a Leave vote had been discounted as the financial and betting markets favoured Remain. According to an article in the New York Times following the referendum, the lesson for the US was not to rely on polls when they show a close race (Why the Surprise Over ‘Brexit’? Don’t Blame the Polls).
The shock and awe of the Trump victory meant that media insight into the analysis of the big data failure in the US polls was cursory. However, a report in Bloomberg Businessweek after the election suggests that the Trump campaign picked up on the change of mood early and began to “reweigh” their polling results (Trump’s Data Team Saw a Different America – and They Were Right). Another post mortem report, this time in the Huffington Post, suggested that Clinton’s focus on big data over people backfired badly: it may inadvertently have helped Trump to mobilize his vote in the mid-West states (Hillary Clinton’s Vaunted GOTV Operation May Have Turned Out Trump Voters).
According to an article in Fortune Magazine from November 2013 (When Big Data Goes Bad), part of the issue is that many data analysis projects depend on erroneous models, making mistakes inevitable. We are often led to confuse the flawed results for omniscience. The problems resulting from faulty analysis are compounded by our lofty expectations. The Fortune article identifies the following potential issues with big data:
- The data are often not properly normalized (no apples-to-apples comparison);
- The models are often not peer tested or reviewed; and
- Most crucially, the data are usually siloed inside of large corporations instead of being democratically available and verifiable.
With all that mind, I made sure I had a healthy dose of salt to hand when I read the recent Legal Trends Report 2016 by Clio. This report was billed as no less than “actual data-driven insights into the legal sector” for the first time in the 4,000 year history of the legal profession. Lofty expectations, indeed! It is an exciting development for sure, but I’m not sure the results were omniscient.
Clio, for those of you who have been practising in a cave for the last few years, is a Vancouver based software company, providing back office Software-as-a-Service to solo lawyers and small law firms, predominantly in the United States. Clio boasts a subscriber base of 40,000 active users. Since the data are stored on the cloud, Clio can access certain statistics. Clio is naturally guarded with the data it accesses and releases from customers. The data for the Legal Trends Report are aggregated anonymously from subscriber law firms in the US, although firms can opt-out of the data analytics. It doesn’t reveal the actual sample size in the report so we don’t know how many opted out of the aggregate reporting.
The headline result of the Clio Report was “Death by a Thousand Cuts: The Lawyers Funnel”. It explains that the average American lawyer utilization rate amounted to a paltry 2.2 hours per day, or 28% of an assumed average 8 hour day. As you would expect, the realization and collection rates were lower: the average American lawyer realization rate is just 1.8 hours per day, and the collection rate is whittled down to 1.5 hours per day.
This is a very poor result, if it is accurate. At Clio’s reported average hourly rate for US law firms of $232 per hour (assuming 261 workdays in the US) results in annual gross revenue per legal professional of just over $90,000. Remember that this an average: in Iowa the average hourly rate is $129. At this rate, most small and medium sized American law firms are making enough money to keep the lights on, but possibly only just.
The Clio Report itself draws comparison with the self-reported results in the Lexis Nexis Law Firm Billable Hours Survey Report of 2012. The Lexis Nexis Report reported a much more respectable average of 6.9 hours per day billed out of a 8.9 hour working day. 6.9 hours a day scales up to an annual figure of almost exactly 1800 hours, which in my experience is suspiciously close to the standard target for a full-time lawyer to make a good income from drawings or remain employed in BigLaw.
The Lexis Nexis Report had a much smaller sample size of 499 respondents (from 8,000 respondents invited), who participated by email or LinkedIn. This is undoubtedly a small sample of self-selecting respondents. Most likely they are smarty pants lawyers from big firms who hit their targets. The Clio Report noted the sample size of the Lexis Nexis Report to be 0.03% of the [US] legal population, warning of the risk of drawing incorrect conclusions from a small sample. However, assuming a lawyer population of 1.3 Million (ABA Lawyer Demographics 2015), I calculate the sample size of the Clio Report to be just 3%, and that is assuming that there are no opt outs and that all the data are populated by lawyers themselves. So here again caution is justified.
The Clio Report suggests that lawyers are significantly over-reporting the number of hours that they bill. To Clio, it is obvious that the Lexis Nexis Report results must be skewed. The data never lie, right? The Clio Report explains the anomaly between the contrasting results in the two reports by referencing a paper in the Psychological Bulletin of the American Psychological Association (Sensitive Questions in Surveys) which suggests that a phenomenon called “social desirability bias” leads to misreporting in survey results involving sensitive questions about drug use, sexual behaviours, voting, and income. The paper concludes that it is common for survey respondents to edit the information they report to avoid embarrassing themselves in the presence of an interviewer or to avoid repercussions from third parties. But even assuming that a valid comparison can be made, can social desirability bias apply to the respondents of the Lexis Nexis Report to the extent that they inflate the results by a full 4.7 hours a day – over 200%?
The Clio Report examines the disparity between the assumed eight hour day and the low utilization rate (“The Case of the Missing Six Hours” at page 37). It cites two obvious reasons to factor into the low utilization rate: not enough work and inefficiency. To these two factors, I would add firstly that many Clio users may not be lawyers at all—the respondents are defined as “users”. Secondly, that lawyers using Clio may not comprehensively capture all the time that they expend on client work. I note that Alternative Fee Billing is only an option on the more expensive Clio packages, “Boutique” and “Elite”, not the “Starter” package. There is no breakdown of these users in the Clio Report. In any event, no matter how intuitive or simple platforms like Clio make it, recording time can be a time consuming and often demoralizing task. I recall asking a managing partner in a small firm a number of years ago why the lawyers in the firm did not record time. The reply was they did not because a lot of the work was agreed on an alternative fee arrangement basis and statistical analysis of the fees rendered was “frankly, rather depressing”. Worse, in Iowa, I suspect.
My gut instinct tells me that, to paraphrase Mark Twain, the reports of the death of American law firms have been greatly exaggerated. The Clio Report contains interesting insights and may become a useful tool in the future, especially if the data are verifiable and if models tested and peer-reviewed. Meantime, the Clio Report suggests to me that we are not quite in the age of the data-driven lawyer.