"The positive conditions of freedom [are] going to require data infrastructure. The bet that I’m making is that people are down to contribute to the positive conditions of their freedom if it's pitched to them that way." — Salome Viljoen
Sup y’all. Welcome to the Ideaspace.
In a post last fall called Data Is Fire, I compared our current relationship with data to the one early humans had with fire. Just as early humanity's ability to tame fire opened up the doors to civilization, our own ability to tame and understand data will largely dictate where things go from here.
That post quoted a fascinating paper called “Democratic Data” that offered a very different approach to how we measure ourselves. Soon afterwords I spoke with Salome Viljoen, the brilliant author of that paper and a postdoctoral fellow at Cornell Tech and the NYU School of Law, about her ideas for data and democracy, and came away convinced. You can listen to our conversation on the web, through Apple or Spotify, or read a condensed transcript below.
This conversation is the first in the Ideaspace’s series on Data Is Fire, featuring interviews with researchers, data scientists, philosophers, founders, and others working on the frontiers of data about where we are and where we’re going.
YANCEY: I was excited to talk to you about your paper “Democratic Data: A Relational Theory of Data Governance.” Can you explain what this paper is about?
SALOME: The paper makes a conceptual point about information collection and production in a digital economy. How basic economic activity is out of step with how a lot of the law regulating data production thinks about data collection. A lot of the economic activity around information production isn’t about any one individual data subject, but about what data collected from one individual can meaningfully reveal or indicate about all sorts of other people.
When I go around and live my online life and I'm subject to all sorts of surveillance, the point of all of that surveillance isn't so that companies can collect Salome data that they just stick in a Salome folder. No, they're building behavioral models and predictive systems using that information to construct relevant population-level categories that I'm a member of. I'm a millennial, I'm a woman, I'm a cat lover. And then those facts are used to make all sorts of predictions about me and others who share those features to nudge us in various ways.
But that's not how the law thinks about information production. That's not the set of basic activities that the law approaches information production with. It really does think of me as one individual data subject that has a sphere of autonomy that needs to be protected.
YANCEY: Why is that a problem?
SALOME: In the old legal story of what makes datafication wrong, I have my little sphere of my inviolate inner life. I negotiate and set the terms of who can access that inner sphere, and it's the purpose of the law to protect that sphere. From that perspective what makes datafication wrong is that we’re allowing too many companies or too many other entities to violate that sphere. It’s the commodification or legibility of inner life that's so wrong.
But so much of what's actually happening in the digital economy is more complicated than that. It's not that these data flows are wrong because they undermine my personal autonomy. It’s that they may be wrong when they materialize unjust social relations on the basis of them. When it relates me to other people in a way that contributes to social projects of oppression or marginalization. When, as a cost of me engaging in digital life, I’m drafted into projects of oppressing others in ways that violate the sense of mutual obligation I have with those people. But I have absolutely no control over those social processes that I’m drafted into as the cost of engaging in digital life.
YANCEY: You share a pointed critique of the main arguments against datafication. Can you explain those?
SALOME: Today there are two dominant legal responses to what makes datafication wrong. A property-based response and a dignity-based response.
The propertarian response says there are all of these companies that are getting so rich off this valuable resource of our data. It's unjust enrichment, they're taking this resource from us for free. The intuitive theory is that it's our data and we should be paid for it. It’s a punitive response to the incredible amounts of income inequality that have coexisted with the rise of technology companies. It makes intuitive sense.
But conceptually what that gets wrong is that my data isn't just my data. Companies can make every inference about me that they can make right now even if I never contributed any information to the digital economy. Even if I'm totally excluding myself I'm still subject to all of the forms of manipulation and undermining autonomy and subject formation that get critics like Shoshana Zubov worked up.
The dignitarian response thinks of the appropriate legal response to datafication as providing us more protection for our information, almost as an extension of the protections that we ourselves hold as citizens. They help protect me as a data subject with a more robust suite of rights. But they don't have anything to say about forms of mass data extraction that are being used to target not me but a bunch of other people like me in a probabilistic way.
YANCEY: You talk about how there's a vertical and horizontal axis to data using a fictional tattoo AI company. Can you walk us through that?
SALOME: I have this hypothetical scenario with these two people: Adam, who is the data subject, and Ben, who isn’t a data subject. Adam is a tattoo enthusiast who uploads images of his tattoos to this social media company called TattooView AI. He's subject to all of these legal protections along the vertical relationship, which is the relation between Adam our data subject and TattooView our data collector.
But in the scenario in the piece that I walk through, Adam uploads his tattoo image data to TattooView and TattooView partners with local law enforcement who use TattooView’s image database to help them predict which tattoos are signs of likely gang membership. They use this program to identify Adam's tattoo is the tattoo of a gang, and they use this information not to detain Adam, but to detain Ben, who has that same tattoo image.
The problem is that both the propertarian solution and the dignitarian solution have nothing meaningful to say for Ben, who was detained on the basis of the tattoo that Adam uploaded, and in this way is in horizontal relation with Adam. They share this information based on Adam’s tattoo that is in meaningful ways just as much information about Ben's tattoo. But there's no way that we currently take Ben's interests into account when we govern these relationships.
YANCEY: You call these “relational data.” Part of your proposal is to argue for a democratic governance of relational data. What does that mean?
SALOME: If data about Ben's tattoo is being used to detain him, then those interests that he has in that information are of sufficient legal relevance that we should take them into account. There are a number of potential ways the law might take those interests into account. You could imagine civic data being managed by a municipality for the benefit of its citizens. You could have a public trust or a civic trust that manages transportation information or all other sorts of levels of city data on behalf of citizens. You could imagine strengthening worker protections or allowing worker unions to negotiate the algorithm, which is to say that you could imagine forms of workplace surveillance being subject to labor law protections. There are a variety of potentially institutional reforms that would democratize data governance.
YANCEY: This is a moment of a real low point of trust for data, companies, and governments. How do you rebuild that trust with the public?
SALOME: I would love to have a conversation that isn't just that there's far too much data about me that's being collected, and instead getting into one about greater specificity. Conceptually getting specific and correct about what is at the root of our concern when we think about data extraction opens up all of this terrain for us to say if we have a theory of when data extraction is unjust, we also can then have a theory of when data collection is just.
There's all of this data that's being collected about my shoe preferences that helps to architect the entire backend surveillance methods around trying to nudge me into buying more shoes. Yet on the other hand we're not collecting nearly enough data about my water usage. Having a coherent theory of what makes this information collection wrong opens up the door for saying how we change the distribution of information collection, not stop information collection altogether.
YANCEY: I was recently speaking with the CEO of a company in this world and I read to them a portion of your paper. Their response was that if they understood your definition of relational data properly, this sounded like the kind of information that everyone got all freaked out about with Cambridge Analytica. I'm curious what you think about that response.
SALOME: The way I would articulate the Cambridge Analytica concern is being drafted into a project that you disagree with profoundly as a condition of living your digital life. That's not a problem of me consenting or not consenting. That's a problem of the type of data relationships that I was being put into by being part of that apparatus and disagreeing profoundly with that set of social relations.
When we can start to think about things that way we open ourselves up to being far more open to all kinds of positive projects. [The philosopher] Elizabeth Anderson talks about the positive conditions of freedom that we should be securing for one another. That’s going to require data infrastructure. The bet that I’m making is that people are down to contribute to the positive conditions of their freedom if that's how it's sold to them or that's how it's pitched to them. That’s not Cambridge Analytica. And it looks meaningfully different from Cambridge Analytica. We can only see that meaningful difference if our conceptual theory of what makes Cambridge Analytica wrong isn't just a story of consent.
YANCEY: How would we use relational data? Does it help us see inequalities that should be addressed? Does it identify new values we use to distribute goods and services?
SALOME: It could easily be all of those things. Anytime you're thinking about the sets of questions that meaningfully ask what are the conditions of contemporary life, what do we owe one another, and how do we go about distributing those goods and services fairly and responsibly in light of people's competing needs and competing interests, all of those questions will require data infrastructure. But that's not the way that we think and talk about this stuff currently.
A lot of the legal language is about “is my autonomy being violated?” But it’s also true that I live in a society and there are all kinds of ways in which my freedom is meaningfully curbed in service of what I owe other people. That's how we should start to think about the kinds of information production and data infrastructures that we want to build: working out mutually what we think we owe one another.
If we think we owe one another a planet that responds justly and responsibly to the climate crisis, then yes, maybe we owe our water data and our relational water data to one another so that our various water utilities can move into the future with a good picture of what we're dealing with and how they might respond fairly to the climate crisis. If we think what we owe one another is a more walkable city then maybe we do want to contribute our transportation data to a transportation authority so that they can help us to live in more walkable cities. But that's not the way that we think about what we owe and we are owed with respect to our information right now.
YANCEY: Last I have a lightning round of five questions. Think of them as prompts to respond to. Number one: Surveillance Capitalism.
SALOME: Not so much a problem of surveillance, more just a problem with capitalism.
YANCEY: Number two: the Facebook antitrust suit.
SALOME: Good but won’t fix most of the core problems with the digital economy. Breaking up Facebook into four companies that have every incentive in the world to extract as much data as they can in an adversarial way from their clients as opposed to one company that has an incentive to extract data in a semi-adversarial relationship with its customers doesn't really change the underlying semi-adversarial extractive relationship with its customers.
YANCEY: Number three: data portability.
SALOME: Overrated and I don't know what it means. I think there are probably ways that it could make sense. But if you, like me, take the idea of data relationality seriously, data portability quickly falls into conversations that look propertarian to me. Not to say that there may not be a version of it that doesn't, but that's my suspicion with it now. You could start to meaningfully think about entire groups that could move their information together. Sort of group data portability, that would look a lot more like a democratic data model that I'm talking about. Also if you could port your data into a collective entity that could then meaningfully negotiate the terms of that relationship on behalf of its members with a platform, that could also start to look more like a democratic version of data portability.
YANCEY: Number four: The Social Dilemma.
SALOME: I was so scared off by the horrible Twitter discourse about The Social Dilemma that I spared myself the hate-watch experience.
YANCEY: Number five. Complete this sentence: “Ten years from now, our data will be: ___."
SALOME: If I was a salesperson I’d say democratic. [Long pause] Hopefully deleted? Ten years from now all the data about me right now hopefully won't exist anymore. Filling in the gap between those two things, I'm hoping data is collectively and more responsibly managed.
- Salome Viljoen's paper "Democratic Data: A Relational Theory for Data Governance"
- Salome Viljoen on Twitter
- Original Data is Fire post