More on anonymizing tweets and Internet research ethics

Twitter research ethics are complicated, and deserve a more nuanced treatment than my short post from yesterday. I’ll take a stab here at saying a bit more:

Question 1: Is analyzing Twitter “human subjects research”?

I want to start by looking at US law. (Note that this is only applicable in the US and only applies to federally funded research, though some companies chose to voluntarily follow these rules and most universities apply the rules to all research whether it is federally funded or not.)  The policy states that several categories of work are exempt from the rules, including:

(4) Research involving the collection or study of existing data, documents, records,  pathological specimens, or diagnostic specimens, if these sources are publicly  available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.

It’s pretty clear that Twitter data (on open accounts) is existing data that is already publicly available. So legally speaking, I believe researchers are well within their rights to simply use it at will. It’s public, so you can use it. But should you?

Ethical is a higher standard than legal. As Jim Hudson and I found in our study of people chatting on Internet Relay Chat (IRC), people often misunderstand the public nature of online communications. This leads to my second question:

Question 2: If people have expectations of privacy that differ from expert opinion on what is “reasonable,” does that need to be taken into account?

I don’t think there’s a simple answer to that question. It probably has to be addressed on a case-by-case basis.  And if people’s expectations are persistent and continue to differ from the written rules, maybe the rules need to evolve.

If you do consider research on Twitter to be human subjects research, then you need to apply for IRB clearance, and you probably have good grounds to request a waiver of consent.  A waiver of consent is possible in these circumstances:

(d) An IRB may approve a consent procedure which does not include, or which alters,

some or all of the elements of informed consent set forth in this section, or waive the requirements to obtain informed consent provided the IRB finds and documents that:

1) The research involves no more than minimal risk to the subjects;

(2) The waiver or alteration will not adversely affect the rights and welfare of the subjects;

(3) The research could not practicably be carried out without the waiver or alteration; and

(4) Whenever appropriate, the subjects will be provided with additional pertinent information after participation.

In such a case, an IRB might request that the tweets be anonymized, and this would contribute to making the case that the work presents minimal risk. This sounds like a great approach for research on sensitive topics, like epidemiology for example.

Because part of my research is about people’s creative accomplishments online, I am more likely to encounter situations where anonymizing people is unethical because it denies them credit for their work.  We only name people in accounts at their written request, by marking that on a consent form.  And our projects generally use mixed methods—with a combination of analyzing people’s online postings and interviewing them.  I believe this mixed methods approach often gives better research results, and necessarily makes the work human subjects research rather than merely analysis of public information.

I personally prefer to view Twitter research as human subjects’ research and apply for a waiver of consent. Thinking through a formal IRB application and soliciting help from IRB members can help you to think through the details of how to treat your subjects in accordance with principles of beneficence, justice, and respect for persons.  Ethical is after all a higher standard than merely legal.

That said, the public nature of Twitter data is hard to deny.  Maybe the rule about pre-existing, public information needs to be rethought. Something more nuanced would serve us better.

Do we need to anonymize tweets in published accounts?

In this article about tweets being made available to researchers, the authors quote two epidemiologists saying ethical use of Twitter should anonymize tweets:

Caitlin Rivers and Bryan Lewis, computational epidemiologists at Virginia Tech, published guidelines for the ethical use of Twitter data in February. Among other things, they suggest that scientists never reveal screen names and make research objectives publicly available. For example, although it is considered ethical to collect information from public spaces—and Twitter is a public space—it would be unethical to share identifying details about a single user without his or her consent. Rivers and Lewis argue that it is crucial for scientists to consider and protect users’ 

I disagree. Of course it may be more often true for epidemiology, but it really depends on what kind of study you’re doing. As Kurt Luther, Casey Fiesler, and I have written, sometimes anonymizing users may be morally wrong because you are denying them credit for their work. (“That tweet was really funny–I want my name on it!”) Twitter is public, published material. The contents of private Twitter feeds are for followers only, but the contents of public feeds arguably are as public as a newspaper article.  If you want to take extra precautions to anonymize people, that’s fine.  But to say it’s always necessary is ridiculous. It depends on the type of study you’re doing.

Jim Hudson and I empirically studied how people often misunderstand how public their communications are. The complicated question that follows is: if user expectations are out of line with what experts would call “reasonable,” how should the scholarly community proceed? Dealing with things on a case-by-case basis is the best we can do for now.

Is Instant News Healthy?

December 14, 2012 Leave a comment

Thanks to Twitter, I am learning of the school shooting in Sandy Hook, CT before it hits the major news outlets.  I’m pretty sure that’s not a good thing.

Newtown_HH: #Newtown shooting reported at sandy hook school. In lockdown. (RT’d by @acarvin)

Evolutionary biologists and sociobiologists would say that there is an advantage to natural selection to being fascinated with danger.  The more we learn about it, the more likely we are to avoid it.  The early hominids who thought “Wow, what is the spider with the big fangs??” and learned all about it were more likely to survive than those who didn’t care.  We are arguably biologically hard wired to be fascinated by poisonous snakes, car crashes, and school shootings.

But getting beyond selection advantages into the modern day, I wish we could turn some of those instincts off.  Is there any value to learning about these things in the heat of the moment, as they happen?  For locally affected folks of course there is.  But for the rest of us who are just gawking at danger, it seems unproductive and unhealthy.

Acarvin: Contradictory reports over whether there’s 1 or 2 shooters in #Newtown. Please not that this type of confusion is _very_ common at shootings

Reading this as it’s happening is not making my morning better informed, more productive, or more enjoyable.  I suppose it’s an important service to provide this news quickly for those directly affected, but I’m going to try harder to turn it off when it’s more distant. Give me the summary tomorrow.

