Thursday Thinkpiece: Cohen on Privacy

Each Thursday we present a significant excerpt from a recently published book or journal article. In every case the proper permissions have been obtained. If you are a publisher who would like to participate in this feature, please let us know via the site’s contact form.

by Julie E. Cohen
126 Harv. L. Rev. (forthcoming 2013)

Excerpt: pp. 16-19 of online symposium paper

[Footnotes omitted. They are available in the PDF version of the article, available via the link on the title above.]

Innovation is never a neutral quantity. Technologies and artifacts are shaped by the values, priorities, and assumptions of their developers, and often their users as well. Of course, many technologies are designed or refined with particular goals in mind, but here I am referring to a different and less deliberate shaping process, through which artifacts come to reflect and reproduce beliefs about the types of functions and ways of living and working that are important. To return to a previous example, the design of an in-car GPS interface prioritizes getting from point A to point B most efficiently. The design of a child’s car seat prioritizes modularity and affordability over compact size; therefore, it promotes safety but not the purchase of smaller, more fuel-efficient cars. The techniques of Big Data are no exception to this rule of cultural constructedness. In particular, I want to highlight three distinct but mutually reinforcing problems: three ways in which the shift to Big Data now playing out within the particular context of our system of advanced informational capitalism seems likely to reinforce certain values, and favor certain kinds of knowledge, over others.

The first problem concerns hidden research agendas. Big Data may seem to update and improve upon traditional scientific modeling because its investigations are both open-ended and ongoing. They do not conform to the idea of the scientific research program as a series of limited data collections for the purpose of testing and possibly falsifying a particular hypothesis. Big Data’s relative advantage (according to some) is its ability to make sense, in real time, of an ever-changing data landscape. Decisions about research agendas need not be explicit, however. The research agendas that drive Big Data will be those of the entities that deploy it. It is at this point that a more general principle of falsifiability begins to matter. Even within academic computational science, attaining the transparency required to confirm or falsify results is Big Data’s Achilles heel; observers have begun to point to a “credibility crisis” that derives from inadequate disclosure of data sets and methods. Big Data in the private sector neither pretends nor aspires to transparency; research agendas and data sets are typically kept secret, as are the analytics that underpin them.

The second problem concerns underlying ideology. Even when private-sector research agendas are uncovered and become the subjects of exposes in the pages of The Atlantic and the New York Times Magazine, we seem unable to come to grips in any meaningful way with their epistemological bona fides. That is a very great mistake, and indicates just how deeply Big Data’s core premises are entrenched within our intellectual culture. Big Data is the ultimate expression of a mode of rationality that equates information with truth and more information with more truth, and that denies the possibility that information processing designed simply to identify “patterns” might be systematically infused with a particular ideology. But the denial of ideology is itself an ideological position. Information is never just information; even pattern-identification is informed by values about what makes a pattern and why, and why the pattern in question is worth noting. Pattern-identification also is informed by both content and categorization biases in the databases of origin; thus, for example, the Facebook dataset has particular demographics and reflects particular beliefs about what makes someone a “friend.” Big Data is the intellectual engine of the modulated society. Its techniques are techniques for locating and extracting consumer surplus and for managing, allocating, and pricing risk, and it takes datasets at face value. But the values of predictive rationality and risk management are values, and they are the values with which serious critics of Big Data need to contend.

The third problem is, once again, the problem of constructed subjectivity, and more specifically the problem of subjectivity constructed in the service of the self- interested agendas of powerful economic actors. The integrity of behavioral and preference data is a longstanding concern within social sciences research, and has led to the development of elaborate techniques of research design to minimize distortion. Big Data attacks the problem of data integrity from a different direction because it gathers behavioral data at the source (and often without the subjects’ knowledge). Even when it operates unobserved, however, Big Data cannot neutralize the problem of constructed subjectivity, and instead is more likely both to exacerbate the problem and to insulate it from public scrutiny. The techniques of Big Data subject individuals to predictive judgments about their preferences, but the process of modulation also shapes and produces those preferences. The result is “computational social science” in the wild, a fast-moving and essentially unregulated process of experimentation on unsuspecting populations. Big Data’s practitioners are never “just watching.” And here informational capitalism’s interlinked preferences for consumer surplus extraction and risk management can be expected to move subjectivity in predictably path-dependent directions.

By now it should be apparent that there are important procedural and ethical objections to some of the most common applications of Big Data. As deployed by commercial entities, Big Data represents the de facto privatization of human subjects research, without the procedural and ethical safeguards that traditionally have been required. Population studies using the techniques of Big Data typically proceed without the sorts of controls that might be instituted by, for example, an institutional review board. I tend to think this is a very bad idea. At minimum, it should be uncontroversial to suggest that the human-subjects ramifications require further study.

Other objections are more substantive, and this brings us back to privacy. As already noted, privacy is increasingly cast as the spoiler in this tale, the obstacle to the triumphant march of predictive rationalism. Privacy scholars and advocates have not fully teased out the implications of this positioning, but they are dire: If information processing is rational, then anything that disrupts information processing, including privacy protection, is presumptively irrational. In the long run, I think that a strategy of avoidance on this point is a mistake; the implicit charge of irrationality must be answered. I have argued elsewhere that this characterization of privacy misses the mark. A commitment to privacy expresses a different kind of “sound reason” that we might choose to value—one that prizes serendipity as well as predictability and idiosyncrasy as well as assimilation. The distinction between predictive rationalism and reason directs our attention to the quality of the innovation Big Data seems likely to produce, and to the sorts of innovation most likely to be lost.
Even if Big Data did not continually alter its own operands, it does not operate in a vacuum. It is a mistake to think of the techniques of Big Data as simply adding to the amount of information circulating within society. The valorization of predictive rationality and risk management inevitably displaces other kinds of knowledge that might be generated instead. Stimuli tailored to consumptive preferences crowd out other ways in which preferences and self-knowledge might be expressed, and also crowd out other kinds of motivators—altruism, empathy, and so on—that might spur innovation in different directions. In a consumption-driven economy, the innovations that emerge and find favor will be those that fulfill consumption-driven needs. Contemporary applications of Big Data extend beyond marketing and advertising to core social and cultural functions, including the study of intellectual preferences and the delivery of higher education. Systematizing those functions according to the dictates of predictive rationality threatens important social values. It crowds out the ability to form and pursue agendas for human flourishing, which is indispensable both to maintaining a vital, dynamic society and to pursuing a more just one.

In short, privacy is important both because it promotes innovative practice and because it promotes the kinds of innovation that matter most. The human innovative drive is both unpredictable and robust, but it does not follow either that all environments are equally favorable to innovation or that all environments will produce the same kinds of innovation. If privacy and serendipity are critical to innovation—by which I mean critical both to the likelihood that innovation will occur and to the substance of that innovation—there is reason to worry when privacy is squeezed to the margins, and when the pathways of serendipity are disrupted and rearranged to serve more linear, commercial imperatives. Environments that disfavor critical independence of mind, and that discourage the kinds of tinkering and behavioral variation out of which innovation emerges will, over time, predictably and systematically disfavor innovation, and environments designed to promote consumptive and profit-maximizing choices will systematically disfavor innovations designed to promote other values. The modulated society is dedicated to prediction but not necessarily to understanding or to advancing human material, intellectual, and political wellbeing. Data processing offers important benefits, but so does privacy. A healthy society needs both to thrive.


  1. Thank you for this. It was a most refreshing read.