There’s too little to be gained from treating your users’ data with respect.

A (hi)story of bad incentives.

Henri Stern
11 min readOct 28, 2020

This is part 2 of 4 of a series of musings on the topic of online privacy. I don’t pretend to resolve the problem, simply exploring facets of the space and pulling at strings that may make the web a more wholesome place to explore and help builders think about the moral valence of their technical decisions. View part 1, part 3.

TL;DR — Creator’s incentives on the web today are not aligned with their users’. Being transparent is hard, and being accountable is often undersirable. At best, it gets in the way of good UX; at worst, it directly threatens creators’ business model. Yet, new toolsets may help resolve this tension, making it easier to tell bad actors apart from those who have just found it too hard to invest in protecting their users’ privacy.

As we saw in the last post, being privacy-sensitive as a user is an inherently personal thing. It is about how much you value a given service and how much you are willing to give up to access it. But these trade-offs are extremely hard to measure, and the consequences of the web’s lack of privacy for users can be hard to fathom until it’s too late.
This is part of why online privacy is a lot like global warming. We know it’s deeply important and associate it to a very real sense of dread, but when it comes down to it, the problem seems ethereal and it’s unclear what we can do about it as individuals without simply abandoning a lot of what makes modern-living. In everything you do, in breathing, you put carbon out into the atmosphere; likewise in the digital world, your every click or page view creates new data about you.

So let’s take a step back. While online privacy is a larger issue with a lot of history, it starts with you: the user. Do you even know what decisions you are implicitly making about your data?

In order to know, responsible online service-providers would have to be

  • Transparent — they should clearly communicate what data they collect and create about their users, why, and what power users have over this;
  • Accountable — they should be answerable to their users as pertains to data use and any breach of the above-defined policies.

Easy enough, right? Obviously not.

Let’s start with transparency.

The Internet was clearly not built with our modern usage in mind. The expectation that the entire world would upload deeply sensitive personal information on this shared network was a reach and this informed the underlying first-class concerns for the system. Resiliency? Yes. Flexibility? Yes. Privacy? Security? Not so much.

And so, as corporations started building all the tools that make our lives easier today, there were no standards for how to deal with user-data or what was seen as personal or sensitive. Standards have arisen since like HIPAA, SOC2, CJIS, FedRamp, PCI, but they are largely industry-specific and most often deal with the legal liability of the creator (read corporation), rather than strict data usage rules or recourse for the user. These standards often conflate security, infrastructure and privacy concerns, making it that much harder for a creator to thoughtfully think about the privacy trade-offs they are asking of their users. The response is to simply find ways to check boxes into compliance, rather than think more qualitatively about what data you are collecting, why and how you are treating it.

Now the standards are hardly the root cause here. But along with the historical insensitivity (both technical and cultural) to user privacy, and the fact that talking about data policies is complex (and pretty boring for users), it’s no wonder that privacy considerations are a question of compliance rather than one of open communication for most online services today.

The standards have been set for transparency, and they are abysmally low. The only language that’s been set for discussing data rights and privacy is drab legalese. Online privacy is a no-win topic for companies for the most part. It’s not about establishing trust with a user, it’s about hoping the subject never comes up and using draft language users are unable (and shouldn’t have to) understand.

For the most part, privacy is the realm of the privacy policy and service-providers get away with shoving it out-of-sight and forcing their users into a “take it or leave it” posture: “by using this service, you agree to…”

Here’s an example of GDPR-handling for you. Enjoy.

Going to npr.org from France.

Clicking anything but the shiny blue button above is no fun at all. Giving users a choice to view a modern website in plain text is just another way to say “agree to whatever we want or get the hell out of here.” And so, while GDPR is a user-centric policy, in this instance it plays out by having the creator simply privacy-wall their product. Again, this is an abysmal privacy standard.

Mind you, communicating dataset policies is as difficult as data usage is nuanced, so it’s no wonder a one-size-fits-all solution doesn’t work well. But this state of affairs makes it very hard for the end-user to tell a well-meaning creator apart from an exploitative one. Only those companies willing to make privacy a central part of their brand take any time to talk about it.

There’s a deeper issue beyond that: users shouldn’t be expected to adjudicate competing technical claims from their service providers. There should be a privacy floor to what companies can do. Yes, over-the-counter drugs have labels listing side-effects, but that doesn’t mean drug manufacturers can sell anything OTC. There’s a ceiling to how much harm they can cause and beyond which “you should have read the label” is no longer an excuse when mistakes are made. Anything relying on transparency alone to mitigate privacy harms has already, almost by default, abandoned a “privacy-by-default” approach.

On a quest for clearer labels, and privacy policies.

Now, we don’t yet have the language to discuss these issues or smoothly enforce pre-launch trials, but there are a few noteworthy attempts to set new standards on software releases. Chief amongst them are the iOS and Android ecosystems (and their accompanying permission systems). In mediating app access to the users through locked-in operating systems, Apple and Google get to create a language for user privacy. These standardized privacy menus and permission prompts allow (very motivated) users to get a sense of the creators’ intent and privacy mindset.

Yet, as we see below (looking at you Amtrak), this only works insofar as the creator is willing to play the game.

iOS location permissions at work

These ecosystems give users some language through which to make privacy decisions, and they set a floor on privacy delinquency from app creators through the app store review cycle. But, as evidenced in recent years, there are many issues with this model for privacy standardization.

  • First, these privacy standards are often applied arbitrarily and always beholden to the ecosystem creator’s economic interests. What if Google or Apple’s privacy stance is not the right one for you? What if you do have something to hide?
  • Second, what about those devices outside the ecosystem? Should you have to learn to navigate multiple sets of standards across your devices?

Finally, why should you content yourself with Apple or Google’s mediation? Why shouldn’t a creator be able to establish trust with its users directly (and potentially avoid a 30% rake)? Why aren’t there more companies vying to connect with their users on this point more directly, rather than letting Apple do this on their behalf? Alas, here we are forced to acknowledge that users simply don’t care enough today. Or at least, if they care about privacy in the abstract, they’re unsure about how to act on it and their stated preference ends up looking a whole lot like indifference (again, think global warming here).

Still though, yes it’s hard and the payoff is unclear… But couldn’t well meaning creators do better?

And this bring us to accountability.

Accountability here takes two forms. Weak-accountability means users have clear recourse when it comes to their data. They may trust a custodian with this data but they can always request to examine it or delete it. Strong-accountability implies string guarantees around what the service provider can do with user data. Weak-accountability means a user trusts but can verify, strong-accountability means a service provider simply can’t do harm.

To illustrate this, let’s take a simple example of strong-accountability. Consider the Sonos One speaker. Sonos sells the Sonos One (below left) and the Sonos One SL (below right). It’s the same speaker but one has a microphone and one doesn’t.

Three claps for user choice!

Notice that in this case, your privacy decision is embedded in the hardware. Sonos has no choice but to respect your decision since they can’t change their terms of service and turn on a microphone that isn’t there. Why isn’t this the standard for more products online?

In part, this is because data is much closer to software than hardware. Software is fluid, its use, capture and distribution can change easily, unlike whether or not your amp has a microphone. And there are many good reasons for a creator to change what data of yours they’re ingesting. Data helps creators learn about how you use their products so they can improve them. It helps them make sure things are working for you and improve your experience over time by seeing what works and doesn’t. It allows them to customize the product to your needs. And so, as the product evolves, it’s fair to expect that the data it tracks about you should too.

And this is the crux of the argument for weak-accountability: data privacy is a spectrum along which you trade off privacy in exchange for certain product experiences. But is this dichotomy a real invariant or just a product of some historical path-dependency? Does building product necessarily entail a choice between good UX and respecting your users?

Given the current state of privacy-preserving tooling and the lack of simple standards around consumer privacy, it’s tempting to say yes. For most startups, better to focus on great UX and finding Product-Market Fit rather than reinventing data practices when there seems to be little to be gained from it.
Because communicating data policy is hard (lack of transparency) weak-accountability quickly becomes meaningless. Users end up having no real recourse anyways. Since they don’t need to explain their decisions, creators are free to decide what data to harvest from their users, regardless of how useful the data actually is. They need only update their privacy policy which no one reads. The question is no longer “will this data allow me to build a better product? Can I justifiably harvest and store it? Is there no alternative to this? Is the risk-reward worth it?” Instead, the thinking is “Let’s grab everything we can… Could be useful at some point.” The vague but widely-accepted value of data in an ad-based digital ecosystem or in productized machine-learning make it that much easier not to think twice about what data you harvest.
For many companies, this isn’t necessarily malicious intent, simply a question of where to spend resources in an industry that has not rewarded doing right by your users in this field and doesn’t punish wrongdoing.

Talking to Pete Snyder from Brave, it’s clear that some builders think this tension between great UX and respecting your users can and must be resolved, and that any standard below privacy-by-default (i.e. strong accountability) means you’ve already given up on your users’ privacy.

Brave, for instance, has taken a privacy-by-default approach to building their browser, skinning Chromium for a private-by-default experience. But in order to do so, they’ve had to entirely rethink the browser’s relationship to advertisers.

The Brave headline — notice privacy is just a third of the pitch here.

In this sense, adapting the standard business model is what enables brave to resolve the perceived conflict between privacy and great UX. Not only is there no incentive for ad-driven businesses today to resolve this tension, but it runs directly counter to their interests. Their real users are advertisers to whom they are reselling ingested data and the products are the bait with which they catch user attention and data.
Full control over the data is key here. This business model only works if the business controls your data, it can’t be opt-in for the user. Why doesn’t Facebook offer a paid, ad-free, tier? They had revenues of around $70 billion in 2019, and around 2.5 billion users at the end of the year. They could conceivably offer an ad-free experience for $35 a year. Why don’t they? I tend to think that Facebook couldn’t monetize their users as well in aggregate if any could opt out of having their data sold (there might be a preponderance of rich, hence ad-desirable, users opting out for instance). Their business only works if users fully hand over their data. No take-backs.

Beyond that, the cost of a creator’s mistake will be borne by their users for the most part. It’s their private information exposed, precise attribution (tracing any leakage back to a specific negligent action) is hard, the industry as a whole operates under these norms and existing law does little to prosecute offenders. Take Equifax’s data breach from 2017. An unpatched service led to the theft of personal data from close to 150 million Americans, none of which had agreed to be profiled by Equifax in the first place. This $3 billion a year revenue-business paid $575 million in fines and went about its business. The consumer had comically bad recourse. Cynicism won: from a pure cost-perspective, Equifax probably made the right investment. While it may be hard to legislate left of breach in such a fast-moving industry, users should at least be able to hope for recourse when faced with such criminal negligence. The FTC may be a better model than the FDA in the digital world (per the drug analogy above).

But here we’ve moved beyond a lack of incentive to protect your users, toward business models that prey on users. Some online business’ are simply data peddlers. They hold and resell your data as their own, becoming a “trusted” provider without which you are cut-off from friends, modern financial services, etc. For those companies, anything close to letting you read/write your own data threatens the business itself.

Many creators settle for weak-accountability because there’s simply not much of an incentive to do better. Given the lack of language to describe privacy and data policies, this typically means users are simply kept in the dark. This is good for bad actors around the web, whose businesses depend on fully owning their users’ data. It can be very hard for a user to tell the two apart which has made the Internet a dangerous place.

This may simply be path-dependency (i.e. we didn’t have to get to this model but we did) but we’re clearly locked into a system that favors opacity around user privacy and data rights. The question, then, is where do we go from here? Is there some fundamental reason we should have to negotiate the terms under which we negotiate our data to companies or could we expect real, provable privacy-by-default?

Part of the answer may lie with the increasing number of privacy-preserving primitives being developed like federated learning, zero-knowledge proofs or forms of multiparty computation. While traditional tech companies had no incentive to develop these primitives, they are being developed by a growing number of new businesses that have espoused privacy-by-default. As it happens, many operate in the blockchain space. Accordingly, one can reasonably ask: is Web 3.0, the decentralized web, our only way out of a world of bad privacy incentives?

--

--

Henri Stern

Probably thinking about your data right now. Hopefully it's encrypted...