If you’ve followed me on Twitter for any length of time, you’ll know I’m someone who is very much into open access, open science, transparency, replicability and traceability. I strongly believe that we should make our research as open and transparent as possible so that people can replicate our findings (or fail to replicate, as the case may be). In fact, if you read my tweets from when the DART initiative started, I was actually pretty supportive.
@rkwrice @thosjleeper @ArthurLupia @DARTsupporters I'm a very strong supporter of DART, even for qualitative datasets! 🙂
— Dr Raul Pacheco-Vega (@raulpacheco) June 19, 2015
Look, people: I can perfectly in the same thought support DART insofar it aims to improve transparency, and still raise valid concerns.
— Dr Raul Pacheco-Vega (@raulpacheco) January 1, 2016
Today, I am not so sure anymore. I started feeling quite uneasy and I can’t quite put my finger on the reason why. Since I use my writing to reflect on issues, I decided to write a blog post.
Writing a paper based on secondary sources? That's fair. Referencing someone else's empirical work? Sure. But using someone's primary data?
— Dr Raul Pacheco-Vega (@raulpacheco) June 5, 2016
Transparency is an issue that is important to me, but one that I don’t spend my every living moment thinking about. So, bear that in mind when you critique/read my post. These are some reflections that may be preliminary, and that may or not be correct. I don’t have the time nor the inclination to spend much time on this reflection at this point in my life. When I submit a paper to a journal whose standards for replication require me to be completely transparent about my data, I’ll have to reflect on it a lot more. Right now, I have papers to finish, so this reflection is necessarily incomplete.
The issue that prompted my reflection and uneasiness was the following: I read a paper authored by a specific researcher, we’ll call him A. This researcher used the primary data generated by another researcher, we will call her B (who seemed quite knowledgeable about the topic). Despite the fact that the fieldwork seemed quite short, I gave B the benefit of the doubt as she had published quite extensively about the topic. But A hadn’t. A’s foray into the topic seemed basically grabbing B’s data, thinking about it through a different theoretical lens, and voila, you have a peer reviewed paper. A cited B, and mentioned he had used her data.
When I finished reading the article, I felt really uneasy. This action (A publishing a paper using B’s fieldwork data) seemed unethical to me. I don’t know why, I can’t shake the feeling, and it may actually not be the case. But that’s how I felt. I shared my thoughts on Twitter. Now, if you know me, you’ll know that I’m more than happy to share my quantitative datasets, my papers, even my conference (draft) papers. Heck, I was even willing to share raw field notes. But this particular paper stopped me on my tracks.
The question, obviously, lingers… shouldn’t qualitative data be subjected to the same standards as quantitative? Why am I so willing to share my datasets so openly but my fieldwork (qualitative) data (interviews, participant observations) I don’t feel as excited to share as before? I don’t know, and I’m not sure if it’s the fact that I’m pre-tenure and I worry about being scooped (seeing as I’ve seen 5 papers published that were exactly the pieces I was going to write, this feeling has increased in the past few months).
WARNING – my commenting system is somehow not working as well as I would like it to, so if you have a long comment to write, I suggest you type it in Word or LaTeX and save it, and if you can’t insert it into my comments box section, email it to me, and I’ll post it directly on the WordPress interface.
I found this article whilst searching for your thoughts on working with secondary data, which I am thinking specifically about whilst preparing to access and use a secondary dataset. So I thought I would show my appreciation by replying 🙂 It’s hard to comment too much without knowing the specifics but personally I think it sounds like a good thing providing researcher B has made an “original contribution to knowledge” and there may be several reasons that this could be contested. I can see why it made you feel uneasy as primary data collection is laborious and emotional but at the same time I also think there is an ethical argument to make use of existing datasets and reduce the burden of collection. Being scooped is a big concern but doesn’t seem to be the issue here. Have your thoughts on this changed in the last years?