Rock, Paper, Statistics: Mass screening for rare problems endangers society

Dr. Vera Wilde applies universal mathematical laws to problematise mass screening for rare problems in society, and offers policy recommendations to better safeguard the public against the dangers of its potential for harm.

What’s the Problem?

We can’t protect everyone from everything — and can do harm trying. Armed with new technologies to promote public interests including health, safety, and truth, we hit old limitations from universal mathematical laws — laws whose implications are not widely recognised. Thus, as technologies advance, proposed programs keep hitting the same wall, known in probability theory as Bayes’ theorem. Bayes’ says the probability of an event changes depending on the subgroup. 

When applied to mass screening for low-prevalence problems (MaSLoPP), Bayes’ rule implies that these programs backfire under certain conditions: 

  • Rarity. Even with high accuracy and low false positive rates, the sheer volume involved in screening entire populations for low-prevalence problems generates either many false positives, or many false negatives (the accuracy-error trade-off). 
  • Uncertainty. When real-world scientific testing cannot validate results, inferential uncertainty persists. 
  • Harm. When secondary screening to reduce uncertainty carries serious costs and risks, resultant harm may outweigh benefit for the overwhelming majority of affected people. 

MaSLoPP occurs across diverse domains including security, health, (mis)information, and education. Although rarely recognised as a class, it shares a common mathematical structure that may endanger society. A currently controversial case illustrates the problem: Proposed mass digital communications scanning for child sexual abuse material (CSAM).

Case Study: Chat Control 

Political Drama

Proposed laws across Europe (Chat Control, aka the Child Sexual Abuse Regulation or CSAR), the UK (the Online Safety Bill), and the U.S. (the EARN IT Act) would require client-side scanning for CSAM. Imagine if, every time you use email, messenger, or chats, governments require the companies running the infrastructure to scan your communications for evidence of abuse – and report hits to the police on the basis of some algorithm that analysts don’t understand well. Implementing these programs would destroy end-to-end encryption, create mass surveillance infrastructure, and, according to the implications of probability theory, backfire  – endangering children. 

The stage is set for a battle between policymakers who understand this, and policymakers who don’t. Decades in the making, this transnational initiative made headway when the UK Parliament passed the OSB on September 19, and the U.S. Senate Judiciary Committee for a third time sent the EARN IT Act to Congress for consideration in May. But Chat Control met strong opposition in the European Parliament last month. 

Anticipated votes were twice postponed amid ongoing controversy before the Parliament’s Civil Liberties, Justice and Home Affairs Committee (LIBE) rejected several problematic aspects of the Commission’s proposed CSAR. LIBE’s counterproposal draws a red line against client-side scanning, and protects end-to-end encryption and potential whistleblower anonymity by design. It also proposes to protect young people better than the Commission’s proposal in two ways. Firstly, LIBE’s counterproposal mandates that the new EU Child Protection Centre crawl the web for known CSAM, law enforcement agencies report it to providers, and providers remove it. And secondly, it enhances privacy protections for users on Internet services and apps in ways designed to mitigate grooming risks. 

It remains now for LIBE to vote to adopt this negotiated compromise proposal on November 14, and prepare a mandate for the trilogues. Then, the EU Parliament could oppose the committee’s compromise within 24 hours. Next, the EU Council is expected to adopt its own approach on 4 December, with subsequent trilogues between the EU Commission, Council, and Parliament expected to then continue finalising the legislation’s text either by early February, or after the June 2024 elections. 

Crucially, the final CSAR text could still include mass scanning. Here’s what policymakers need to know about the empirical implications of this controversial provision. 

Statistical Reality

Reapplying public statistician Stephen Fienberg’s classic application of Bayes’ theorem from the National Academy of Science’s polygraph report to mass digital communications scanning: Assuming a 1/1000 CSAM base rate, an 80% detection threshold, and a .90 accuracy index, scanning 10 billion messages – a fraction of daily European emails –  would generate over 1.5 billion false positives. There’s a 99.5% probability (1,598,000,000/1,606,000,000) that a message flagged as abusive would be innocent. There’s an almost 16% chance (1,598,000,000/9,990,000,000) an innocent message would get flagged. Yet, 20% of the time, abuse would evade detection. Meanwhile, there would be almost 200 messages mistakenly flagged for each abusive message found (1,598,000,000 false positives / 8,000,000 abusive messages).

Technology can’t escape math. Setting a lower detection threshold to reduce false positives leaves too much abuse undetected; raising accuracy to detect more abuse generates too many false positives. That’s why the National Academy of Sciences concluded mass polygraph screening of National Lab scientists offered “an unacceptable choice”. Mass digital communication screening faces the same dilemma. 

Subjecting entire populations to screenings like Chat Control implies following up many uncertain results with investigations – potentially traumatising a large number of innocents, including minors. Will society jeopardise many children’s well-being to save a few? How did minors consent for their private communications’ use in training this AI? How will minors’ sexual images and texts, no longer protected by end-to-end encryption, be secured against misuse? What about privacy? Will resources needed for investigating false positives overstretch law enforcement capacities? Will reporting take longer due to the proposed shift from hotlines working closely with law enforcement to a centralised bureau? Who will measure possible harms from this program, and how? 

A Matter of Digital Governance

As digitalisation and AI increase infrastructural capacities to deliver public services, new MaSLoPPs may appear to improve on old ways of advancing the public interest. Their high accuracy and low false positive rates – probabilities – may sound dazzling. But translating the identical statistical information into frequency formats – body counts – shows otherwise. The common (false positives) overwhelms the rare (true positives) – with serious possible consequences. Ignoring this fact is known as the base rate fallacy

Nontransparency and perverse incentives often further trouble MaSLoPPs’ reported accuracy rates. For example, Chat Control’s accuracy estimates are based on unpublished analyses of unknown data from commercially interested sources. Chat Control has been heavily promoted by Thorn, a group registered in the EU lobby database as a charity that has made millions from U.S. Department of Homeland Security contracts for its AI tools. Across domains, quoted MaSLoPP accuracy rates are often generated by researchers with economic incentives to inflate them – and cannot be verified when the tests cannot be validated in the real world. 

Chat Control exemplifies a broader class of signal detection problems where MaSLoPP produces uncertain benefits and massive possible damages. Other such programs across diverse domains all face the same accuracy-error trade-off under the same universal mathematical laws. Rare problem incidence, persistent inferential uncertainty, and secondary harms doom many well-intentioned programs to backfire. 

This case study highlights the need for better information security and statistics education supporting international cooperation to enhance cybersecurity. Experts from industry and academia agree that the proposed program would create more digital infrastructural vulnerability than it would mitigate. What is not being discussed, however, is how the same mathematical structure also characterises mass screenings that tech companies conduct on their digital platforms for misinformation at the behest of governments. 

In a step toward recognising MaSLoPP’s risks, the EU Parliament’s recent draft AI Act recommends banning several AIs sharing this structure. The implications of probability theory apply equally to these programs – emotion-recognition AI (recently piloted in border crossing screening tech iBorderCtrl), real-time biometrics and predictive policing in public spaces, and social scoring. But we need a new legal and ethical framework beyond AI, regulating the class. 

Policy Recommendation

The European Commission’s Better Regulation agenda is meant to ensure evidence-based policy. Policymakers should apply scientific evidentiary rules to evaluate the likely impact of proposed interventions including MaSLoPPs before implementation:  

  • Proponents bear the burden of proof to establish that new interventions will do more good than harm. 
  • Independent reviewers evaluate the evidence of claimed costs and benefits. 
  • Relevant data must be public, including information about its production, storage, analysis, and interpretation. 

Applying these rules to policymaking may prevent massive societal damages from well-intentioned programs that are unknowingly trying to break universal mathematical laws that we cannot escape.

This text is adapted in part from previous posts (e.g., on Chat Control, Chat Control again, the AI Act, how the usual liberty versus security framing is bogus, and why the regulatory challenge isn’t just about AI).

Teaser photo by Markus Spiske on Unsplash.