DNA and Surveillance Capitalism

21st of April, 2019 (updated: 2020-09-21)

2226 words

There has been an increasing public debate in the last few years around the use of data by states and tech companies. This has mostly been focussed on companies like Google and Facebook selling futures on human behaviour¹, mostly to an advertising market, based on the vast quantity of data they hoover up from our online activities. There is another category of data, however, genomic or biometric data, for which there is an increasing market [2], and which has been largely ignored by the public debate. This echoes of the lack of debate ten years ago on the way large tech companies tracked us around the internet, which is troubling because it means that people are giving up their data to these companies with very little awareness of the consequences.

Whilst it could now be argued that a lot of the data the big tech companies ingest are taken by stealth, through the many websites that integrate Google adwords, or Facebook’s beacon and other “community engagement” functionality, at the beginning, most users (myself included) seemed perfectly happy, in our naivety, to supply these behemoths with our primary (e.g. photos, status updates, and web searches) and ancillary (e.g. what we clicked on, with whom we’re connected, when we perform actions and how those actions are correlated) data, gratis, in exchange for services which were ostensibly provided to us for no charge at all.

Since the drastic reduction in the cost of genomic sequencing [3] over the past decade, numerous companies have appeared, offering sequencing services at a low price which purport to reveal your vulnerability to a host of diseases, or to help you piece together your ancestry. Just as our browsing and social data were given to companies who have been using them for purposes to which we might well have objected had we forseen them, countless individuals are giving up their genetic data, ostensibly for one purpose, whilst having no understanding, visibility, or control over the future purposes to which these data might be put. It is highly disturbing to see the willingness with which people will give up data which are so intrinsic to them without the slightest guarantee that these companies will not use their data for more nefarious purposes further down the line.

There are two obvious uses to which these data are likely to be put (and already are [4]). The first is an analogue of the advertising model, except that instead of selling futures on human behaviour, genomics companies will sell futures on human needs and health, most likely to insurers and healthcare providers, who can target advertising and price services according to each user’s genetic makeup. This is, in fact, a lot more sinister than the uses to which our browsing data are put, since not only is it going to be very difficult to obscure oneself from the view of such companies once these data are submitted (genetic data are not going to substantially change throughout an individual’s lifetime), the accuracy of the models does not matter because we have no way of determining it ourselves.

That is to say that we have no way of knowing, without an intermediary (who may have a vested interest), towards which health outcomes we are predisposed, nor what solutions are effective at promoting the outcomes we want. This is because, unlike with our tastes and desires, we cannot ask our genes to tell us the answer. Whether an allele that is present in my genome is really indicative of my propensity to develop, for example, diabetes, the mere fact of being told by a company that it does, and then subsequently being advertised to by another who offers to sell me a solution, will influence my decision to purchase that solution, even if the true effect will be nil (which I have no way of testing).

These companies have created the perfect system where they tell the consumer what they need, and then sell them the product (or sell the propensity to buy to a company who sells the product), based on a model whose accuracy is impossible to determine by that consumer. It is a completely opaque market. This is in stark contrast to the previous advertising model of the internet, whereby if Google guessed that I was really interested in Barbie dolls, and therefore increased the number of Barbie dolls I was shown as I browsed, I would be no more likely to buy Barbie dolls — the model was simply wrong.

The second obvious use is to bolster the surveillance state, as in the example of FamilyTreeDNA opening up access to the FBI [5]. We have already seen this in action with the case of the Golden State killer, who was traced using the DNA of two distant relatives who had uploaded their data to an online repository of genetic data. This is doubly pernicious as that individual did not need to upload their data in order to be traceable. Genetic data of relatives is sufficiently similar to that of a given individual that much can be inferred using their data instead of those of the individual in question. This makes the question of consent a completely moot point, I no longer have the power of consent over my own data.

As is often the case, catching a serial killer or stopping a terrorist seem like great reasons to allow the state to access these data. This is why governments frequently cite these examples whenever they are trying to argue for a reduction in citizens’ rights (to privacy, fair trial, etc.). The consequences of accepting this argument is to open the floodgates to all sorts of abuses. Consider what is happening in China, where the biometric data of the Uighurs are being forcibly collected [6]. These data can be used to keep track of dissidents [7] and further harass an already persecuted minority. These arguments are already well rehearsed in the debate about communications data (and metadata), and apply to biometric data as well.

This market in genetic data raises an extremely important question which has existed for other forms of data, but never so obviously as for genetic data: who owns data when they pertain to more than one individual? A good example is in the photographs uploaded to social media platforms like Facebook, Instagram, Twitter, and so on. If a friend of mine takes a photo of a group, and I happen to be in that group, I rarely get asked whether I consent to that photo being posted online. Yet now my image can be used in facial recognition databases, my whereabouts pinned to a given location at a given time, and my associations with others inferred directly by my colocation with them².

Most people would consider the photograph to be owned by the photographer, yet the photographer does not own the likenesses (and therefore biometric data) of the individuals photographed, and should be required to gain their consent before using their data. There are exceptions to this, of course. I don’t think that for a family photo album it much matters if my photo is used without my consent, but when posted online, and subject to the sorts of data processing undertaken by large tech companies (not to mention the level of visibility to all the connections of all the people in the photo whether I know them or not), there is a clear violation of my privacy. Likewise, an expectation of privacy is not always reasonable — one can expect to be photographed at concerts, or sporting events. It is worth noting, however, that in these counterexamples, the use to which these data are put should still be subject to strong privacy laws (they should not be sold on to advertisers, for example, but may end up in newspapers and the difference between these scenarios is very subtle).

Ignoring the use of my biometrics from a photograph (through facial recognition), a single photo will give up a limited amount of information about my life, and is certainly open to interpretation. This is unlike genetic data, which are completely irrevocable and unchangeable — once they are given over to a company, short of forcing the company to delete all traces of them from their systems (which is hard to enforce in practice), they own data which will be relevant to my life for all time, not just for the snapshot at which they were captured. I would never consent to a company having access to my data like this, and especially not under the terms most issue for their services.

Unfortunately, I may not be given the choice. I have friends and family who have considered sending in their genetic data. Whatever the reason they might do so, whether genealogical or epidemiological, as soon as they take that decision (a decision about which they are unlikely to inform me in advance), then a lot of my own data suddenly become available to these companies (and, in short order, whichever states and companies pay or coerce them for access).

Here we run into a dilemma: how do we balance on the one hand the individual’s right to use their own data as they see fit, and on the other the right of other individuals to their own privacy, in a world in which these data are not straightforwardly owned by a single individual.

It would be a mistake to try to reverse the innovation which has brought us here, especially because it has many positive consequences for human welfare. As with any innovation, the only solution to preventing its abuse is to legislate to provide strong protections for individuals. This should protect both those who consent to the use of their data, and those who do not. For those who give their data to a genomics company in order to trace their genealogy, it should not be allowed for that company to then sell on their data to third parties even if the user consents as the idea of consent in this sort of scenario is very fraught — it’s very hard to determine that any individual really understands the true consequences in such a nascent area with such abstract consequences. For those who do not consent, and have been swept up in another’s data submission, it should be illegal to attempt to determine anything about them at all.

This approach has a weakness, however, which is that future governments can choose to override protections written into the statute book. One can easily imagine a government making the argument that they need access to data which were submitted when such access was forbidden by law. This is exactly the sort of thing that happened with the “war on terrorism” where many of our rights to privacy were violated by the state in the name of national security. As a result, a submission made in the era of rights and protections has consequences in a completely different context. It’s possible that the only way to really fix this problem is to have a technological solution, either robust anonymisation, or a time limit on how long data can be kept (assuming that other individuals cannot be wiped from the dataset without completely violating its integrity).

In the meantime, my plea to you (and especially to anyone related to me) is this: do not give your genetic data to companies as you are also violating my rights (let alone your own); do not upload pictures which contain my likeness (or that of anyone else who has not directly consented) to a social media platform (or any platform which processes the images); and finally, push for strong legislation on individual rights to privacy, as, ultimately, that’s going to be the only real protection against states and companies far more powerful than we are as individuals.

Shoshana Zuboff talks about the concept of "futures on human behaviour" [1] which means that these advertising companies are not selling eyeballs, or clicks, but a certain number of buyers. They can do this because they have data on the most effective ways to make people perform certain actions. A contrived example might be noticing that men in their forties are more likely to buy an expensive car when they exhibit signs of depression. Google might notice this, and take the opportunity to serve up an advert for a new Porsche. To Porsche, Google sells a certain number of middle-aged males who will buy their cars. ↑
Were you to take a photo of me with your smartphone, it would contain the time and location at which the photo was taken. When it is then uploaded to, say, Facebook, the algorithms Facebook employs will match my likeness to my identity (about which Facebook already has a conception). It will therefore be able to say that I was at a given location, at a given time, using just that photo. Furthermore, anyone else in the photo (or other photos uploaded to Facebook taken at approximately the same location and approximately the same time) could then be assumed to have been there with me, inferring an association between us. Therefore, with just a few photos uploaded to Facebook, a lot of information about my life (and the lives of others) can be automatically inferred, without any of us consenting to such information being made available. ↑