More on anonymizing tweets and Internet research ethics

Twitter research ethics are complicated, and deserve a more nuanced treatment than my short post from yesterday. I’ll take a stab here at saying a bit more:

Question 1: Is analyzing Twitter “human subjects research”?

I want to start by looking at US law. (Note that this is only applicable in the US and only applies to federally funded research, though some companies chose to voluntarily follow these rules and most universities apply the rules to all research whether it is federally funded or not.)  The policy states that several categories of work are exempt from the rules, including:

(4) Research involving the collection or study of existing data, documents, records,  pathological specimens, or diagnostic specimens, if these sources are publicly  available or if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects.

It’s pretty clear that Twitter data (on open accounts) is existing data that is already publicly available. So legally speaking, I believe researchers are well within their rights to simply use it at will. It’s public, so you can use it. But should you?

Ethical is a higher standard than legal. As Jim Hudson and I found in our study of people chatting on Internet Relay Chat (IRC), people often misunderstand the public nature of online communications. This leads to my second question:

Question 2: If people have expectations of privacy that differ from expert opinion on what is “reasonable,” does that need to be taken into account?

I don’t think there’s a simple answer to that question. It probably has to be addressed on a case-by-case basis.  And if people’s expectations are persistent and continue to differ from the written rules, maybe the rules need to evolve.

If you do consider research on Twitter to be human subjects research, then you need to apply for IRB clearance, and you probably have good grounds to request a waiver of consent.  A waiver of consent is possible in these circumstances:

(d) An IRB may approve a consent procedure which does not include, or which alters,

some or all of the elements of informed consent set forth in this section, or waive the requirements to obtain informed consent provided the IRB finds and documents that:

1) The research involves no more than minimal risk to the subjects;

(2) The waiver or alteration will not adversely affect the rights and welfare of the subjects;

(3) The research could not practicably be carried out without the waiver or alteration; and

(4) Whenever appropriate, the subjects will be provided with additional pertinent information after participation.

In such a case, an IRB might request that the tweets be anonymized, and this would contribute to making the case that the work presents minimal risk. This sounds like a great approach for research on sensitive topics, like epidemiology for example.

Because part of my research is about people’s creative accomplishments online, I am more likely to encounter situations where anonymizing people is unethical because it denies them credit for their work.  We only name people in accounts at their written request, by marking that on a consent form.  And our projects generally use mixed methods—with a combination of analyzing people’s online postings and interviewing them.  I believe this mixed methods approach often gives better research results, and necessarily makes the work human subjects research rather than merely analysis of public information.

I personally prefer to view Twitter research as human subjects’ research and apply for a waiver of consent. Thinking through a formal IRB application and soliciting help from IRB members can help you to think through the details of how to treat your subjects in accordance with principles of beneficence, justice, and respect for persons.  Ethical is after all a higher standard than merely legal.

That said, the public nature of Twitter data is hard to deny.  Maybe the rule about pre-existing, public information needs to be rethought. Something more nuanced would serve us better.

