A Research Powerhouse, a Big Data Warehouse

For a week now, users of the social media tool and Twitter data reseller Topsy have been able to search Twitter content from its 2006 beginnings; i.e., “every tweet ever”. (Direct messages not included in Topsy or other data.)

It has been widely noted that this extent of indexed data offers a more practically useful and more comprehensive reach than Twitter itself—and any other reseller—offers. Until last week, Topsy’s reach was to 2010, the middle of the brief period once covered by Google’s real-time search. That Google feature offered some historical research capability beyond the week or so then available via Twitter itself, as still does the rather inelegant and not very useful Bing social search. And the Library of Congress for some time also has been gathering, indexing, and holding for posterity all of our 140-character illuminations.

Much discussion during the week heralded Topsy’s exciting renewed utility as a research tool—for example, for journalists, investigators, and professional researchers. And the excitement of course extends to market researchers. Topsy is, after all, in the business of reselling data.

Unless I’ve been reading the wrong corners of the Internet this week, though, the easy availability of everything any of us has ever said publicly, or every public mention of any of us, via Twitter doesn’t seem to bother many. Various cautions have been raised on this site—for example, three years ago when the Library of Congress began archiving tweets, and more recently to share the wisdom of monitoring our footprints. About a year ago I wrote that a medium of expression that seems ephemeral is in truth persistent, and cause for caution in its use prevails.

Now this persistent content is indexed, readily searchable, subject to aggregation and analysis for all kinds of purposes.

Yes, it’s wonderful that I can research reaction to legislation at the time of its introduction, locate timely discussion of a case when it was released, or find people knowledgeable about a research resource. But anyone can also find—in respect of an account publicly associated with a real person—the subjects that engage that person, the times of day she’s online, those she talks to most, and nearly limitless other information. In respect of that person, all of this on its own might interest only one or two other souls out there. But it is all part of the aggregate of big data forming the base of our public personae.

To me, that’s a tiny bit scary.

Comments are closed.