A publisher’s community of users includes authors, editors and readers, among others, and it is critical to discern these roles (often multiple roles per user) in designing meaningful features and services in support of the scientific publication process.
Digitally, this generally means tracking a user’s website experience and/or gathering this data through external sources. While very common on the web, this kind of data-gathering surfaces many ethical and privacy concerns that require special attention. I am hoping this post can bring some of these concerns to light.
When handling user data, at all points there are two positions that must be upheld – a legal one and an ethical one.
The legal considerations are relatively straightforward. As a responsible organization all product features and services that we design must adhere to national/international legislation that dictates how we can operate. This includes regulation like Do Not Track in the US or the General Data Protection Act being hashed out in the EU. New consumer protection regulation with explicit rules regarding issues like Data Portability (the ability to request a copy of personal data) and the Right to be Forgotten (the ability to delete personal data when requested) are solid frameworks to ensure responsible digital data-handling, but the buck doesn’t stop there.
The ethical questions that arise from data collection transcend existing and incipient legislation. As members of the open access movement, I believe that we all hold a unique responsibility to stand by our values of openness and transparency and continue to pioneer responsible privacy policies.
We seek to answer the questions that arise at moral touch points in advance, not only because we have a moral imperative to do what’s right for our users and proponents of Open Access but also because doing so:
- Enables faster adoption of new services by reducing the fear of the unknown.
- Creates a sense of shared values, thus improving collaboration and increasing the pace of innovation.
- Reduces the risk of unintended consequences via explicit consideration of the long-term implications of our decisions.
Below are some of the issues that I believe should be considered when developing our privacy efforts.
What overarching objectives do our data collection efforts serve? In our quest to publish better, faster, cheaper, what are our priorities and when do they conflict with the needs of our stakeholders?
What solutions, technological or otherwise, will safeguard data from misuse? We understand that careers depend on the information our users share with us and we must do everything within reason to protect it.
This is an important question with our work: how can we ensure that the data we collect is accurate, relevant and remains as such?
Who is the owner of the data and to what extent? What can and cannot be deleted? We should consider the source of our data—is it reliable, is it reputable? How do we obtain this data? When is it okay to share this data? To sell it? To buy it? To scrape it?
What constitutes uniqueness—is it address, gender, e-mail, birth date, ORCID?
An individual can have multiple identities—professional, social, academic. How do we balance the risks and conveniences offered by OAuth credentials that merge these identities?
The answers to these questions have meaningful ramifications for our users and the Open Access movement. I believe these decisions are never absolute and should be determined on a case-by-case basis. However, by explicitly stating our values I hope that we can create a framework that will enable us to judge when decisions align with those values and when they do not.
Davis, Kord, and Doug Patterson. Ethics of Big Data. Sebastopol, CA: O’Reilly, 2012.