On Value Alignment in AI
Joint work with Matteo Cargnelutti, Tyna Eloundou and Greg Leppert.
Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training reward models (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values, namely feature imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler features (undesired concepts). By regressing RM scores against these features, we quantify the extent to which RMs reward them – a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we assess alignment robustness by analyzing RM responses to perturbed inputs. Our experiments, utilizing open-source components like the Anthropic/hh-rlhf preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment dataset. These findings underscore the importance of scrutinizing both RMs and alignment datasets for a deeper understanding of value alignment.
On Democratic Innovations
Our collective imagination has been captured by the mirage that democracy equals election. Representative democracy (as a collective decision-making process) happens in stratified layers: the filtering of candidates, the selection of representatives, the sense-making and deliberation happening amongst the representatives, and the decision time. If democracy is characterized by the inclusion and equal treatment of all group members, election concentrates these democratic qualities at the selection stage—one-person-one-vote does not guarantee that the filtering, deliberation, or decision-making happen democratically. Alternative models of democracy propose to repurpose democratic institutions aimed at a holistic account of equality and inclusion in the decision-making pipeline. Most prominently, scholars have discussed re-introducing democracy by lot—whereby randomly selected citizens deliberate to reach highly consensual decisions.
This system (called lottocracy, or sortition) was famously in place in the Greek city-state of Athens that accommodated up to a thousand randomly selected officials and 30,000 over-aged citizens. In modern societies, my chances of being selected in my lifetime may be less than 5%: should I feel more included as an episodical voter or as a hypothetical decision-maker? In How to Open Democratic Representation to the Future? I argue that representation in democracy is due for an upgrade: we need innovative representative mechanisms as well as renewed democratic theories that account for the novel societal and technological conditions under which we live. If there is no such thing as an ultimate form of representation, there is no such thing as a static democracy.
Work presented at the Harvard International Reimagining Democracy Workshop, Workshop on Long Term Risks and Future Generations, Harvard University's Ash Center for Democratic Innovation.
Selecting Experts Democratically
Joint work with Adam Berinsky, Daniel Halpern, Joe Halpern, Ali Jadbabaie, Elchanan Mossel and Ariel Procaccia
Can we tap into collective intelligence to improve decision-making? Mathematicians have been interested in this question since (at least) the late 18th century, developing theoretical frameworks to benchmark different collective decision protocols. One framework assumes a correct outcome (a priori unknown) and searches aggregation rules that are most likely to find the correct outcome. Of course, this imperfect model does not pretend to be an exact model of reality. Instead, it purposes to formalize or challenge common intuitions.
Most prominently, Nicolas de Condorcet formalized mathematically, through this lens, Aristotle's profoundly democratic intuition that groups achieve better outcomes when more people participate under mild conditions. This result falls short when voters are not minimally informed about the decision at stake. Note, however, that this model neglects the information voters have about one another (second-order knowledge): I know little about environmental science and may not be informed enough to know how a carbon tax bill should be drafted, but I may know people I would trust to represent me in shaping environmental regulations. And this information matters both epistemically (enhancing collective intelligence and leading to better outcomes) and procedurally (generating an intrinsically fair and legitimate process). Can selection rules that tap into first- and second-order knowledge allow for the democratic selection of experts?
What if voters could decide between participating actively in governance or delegating (transitively) their decision to an agent they trust for a particular question? This procedure is called liquid democracy. While the literature had thus far only exhibited worst-case scenarios (which exist for any decision rules), I wanted to answer a more ambitious and interesting question: how likely is liquid democracy to succeed in different scenarios? To answer this question, my co-authors and I developed a mathematical theory that maps local delegation behaviors with macro delegation graph dynamics. Along the way, we proved a new result on infinite Polya-urn processes and a new (and weak) law of large numbers for weighted majority ( In Defense of Liquid Democracy). I further ran experiments with 12 groups and found striking alignment between the theory and the experiments (Liquid Democracy in Practice: An Empirical Analysis of its Epistemic Performance).
Work presented at the Equity and Access in Algorithms, Mechanisms, and Optimization, COMSOC Seminar, Univeristy of Zurich, Harvard University, Massachusetts Institute of Technology, University of Groningen, Google X, Debating Europe, bluenove, Hypermind, Datascientest...
On the Optimal Congress Size
Joint work with Daniel Halpern and Tao Lin
However small the Republic may be, the Representatives must be raised to a certain number, in order to guard against the cabals of a few; and however large it may be, they must be divided to certain number, in order to guard against the confusion of a multitude. (Federalist Paper No.10) James Madison
Nitzan and Paroush (1984) proved that the optimal decision rule weights the voters' votes by a logarithmic transformation of their expertise. If re-weighting is impossible, the question becomes: what is the optimal number of experts needed to maximize the probability that direct (and unweighted) majority is correct? In How Many Representatives Do We Need? The Optimal Size of an Epistemic Congress, we answer this precise question. Against previous conjectures that assumed this number should grow sub-linearly with the population size, we prove that the optimal congress size is a fraction of the population size.
Mathematically, we assume that we can sample the best experts (the first-order statistics) to form an epistemic congress, and we find that the optimal committee size should be linear in the population size. This result is striking because it holds even when the top experts can be accurate with arbitrarily high probabilities.
However, if we assume that the underlying distribution of expertise varies with the population size, such that its mean decreases too fast (e.g., the cost of education and information infrastructure makes it harder to keep competence constant over time), then a single expert could asymptotically outperform a majority vote.
If you would like to lear more about the maths of democracy, have a look at Professor Procaccia's fantastic class, Optimized Democracy.
Work presented at the 36th AAAI Conference on Artificial Intelligence, and by Tao at WINE, the Conference on Web and Internet Economics.
Mapping the Space of Social Media Regulation
Joint work with Nate Lubin, Kalie Mayberry, Dylan Moses, Luke Thorburn and Andrew West.
Social media platforms mediate a significant fraction of human communication and attention. The impact of social media on society has been under increased scrutiny, and concerns over its effects have motivated varied and sometimes contradictory government regulation around the world. In this review article, we offer two ways of mapping the space of social media regulation: viewing social media either (i) as an architecture impacted by design choices, or (ii) as a market governed by incentives. We survey the most prominent regulatory approaches globally (both enacted and proposed), with an emphasis on the United States and the European Union, and position these options within the two maps. We conclude by discussing the fundamental trade-offs associated with different interventions, comparing jurisdictions, and highlighting paths forward in the context of the potentially-conflicting rights and interests of relevant stakeholders.
Native Ads and the Credibility of Online Publishers
Joint work with Adam Berinsky, Dean Eckles, Ali Jadbabaie and Amir Tohidi
The digitization of news publishing has resulted in new ways for advertisers to reach readers, including additional native advertising formats that blend in with news. However, native ads may redirect attention off-site and affect the readers' impression of the publishers. Using a combination of observations of ad content across many publishers and two large randomized experiments, we investigate the characteristics of a pervasive native ad format and compare the impact of different native ads characteristics on perceived news credibility. Analyzing 1.4 million collected ad headlines, we found that over 80% of these ad headlines use a clickbait-style and that politics is among the most common topics in ads (The effects of native advertisement on the US news industry).
In two randomized experiments (combined n=9,807), we varied the style and content of native ads embedded in news articles and asked people to assess the articles’ credibility (Native advertising and the credibility of online publishers.). Experiment 1 (n=4,767) suggested that different publishers were impacted differently by the ads and motivated the more detailed design of Experiment 2 (n=5,040). This latter experiment used hundreds of unique combinations of ads, articles, and publishers to study effects of clickbait and political ads. Findings from this pre-registered experiment provide evidence that clickbait and, to a lesser extent, political ads, substantially reduce readers' perception of the articles' credibility: publishers using clickbait native ads may trade short-term revenues for audience trust.
Work presented at the International Conference on Computational Social Science (IC2S2), MIT Schwarzman College of Computing Launch, and Technology, Management, and Policy (TMP) Consortium.
Covered in MIT News: Understanding how people make sense of information in the information age
Varieties of Resonance: The Subjective Interpretations and Utilizations of Media Output in France
Joint work with Adrien Abecassis and Bo Yun Park
The resonance of media output plays an important role in the age of misinformation and fake news. While scholars have extensively studied resonance, they have mostly focused on whether and why particular messages align with the predispositions of their intended audience rather than systematically analyzing how they are interpreted by the wider population. Based on a computational text analysis of the media output from more than a hundred different outlets in France and weekly surveys of what people have retained from the news during the same period, this paper investigates the ways in which media coverage trigger different types of resonance in accordance with people’s diverse interpretations and utilizations of the messages to which they have been exposed (Varieties of resonance: The subjective interpretations and utilizations of media output in France). We theoretically argue that resonance is not just an objective alignment between a message and one’s predispositions, but also a subjective interpretation and utilization of the message heard. We empirically identify three different types of subjective resonance: one used for problem-solving, one that is problem-aggravating, and another one that is problem-generating. This research contributes to a better understanding of the mechanisms of resonance by expanding on previous works on the cognitive, emotional, and interactional dimensions of resonance.
Work presented at the American Sociological Association (ASA) Communication, Information Technologies, and Media Sociology (CITAMS) by Bo Yun Park.
Learn about Tinnitus from Social Media
Joint work with Ryan Boyd, Aniruddha Deshpande, Alain Londero, Vinaya Manchaiah, Guillaume Palacios and Pierre Ratinaud
Individuals with tinnitus are highly heterogeneous in terms of etiology, the manifestation of symptoms, and the way they manage their condition. Most of these patients are likely to seek hearing health information and social support online via various websites or social media platforms. Indeed, information is easily accessible online. Further, in absence of evidence-based tinnitus care, patients with similar symptoms can regroup, share experiences, and exchange tips. Even after consultation with healthcare providers, some of the patients continue seeking information online when they feel they did not get satisfying information about treatment options and/or about their prognosis. The present study was aimed at examining the discussions around tinnitus in Reddit posts from 12,000 users over 8 years, using various Natural Language Processing (NLP) techniques (Online discussions about tinnitus: What can we learn from natural language processing of Reddit posts?). We examined the free-texts posts to understand the types of conversation about tinnitus in an online forum and the way in which people with tinnitus reach out to other people for support (informational, emotional, etc.) when coping with their conditions. We hope that this can provide insights, complementary to those collected in the clinical environment, to reflect on new ways to support tinnitus patients.
Work presented at the Virtual Conference on Computational Audiology, (Best Video Pitch Awards).
Covered in Alter Ago Le Mag, Manon Revel: En deça et au-delà des algorithmes
Alternative Realities in Troubled Democracies
Alternative realities are troubling democracy. Eager to discuss these issues beyond academia, I organized conferences and wrote on that issue. Together with Zivvy Epstein and Maurice Jakesch, we organized a workshop to understand, measure and mitigate the spread of alternative realities featuring Renee DiResta and David Rand. For more information, click here. I moderated a panel, Data weaponized, data scrutinized: a war on information, at the Women In Data Science Conference, featuring Camille Francois, Joan Donovan and Bo Yun Park. I also wrote Internet, you lie! (in French) for the French parliamentary journal L'Hémicycle.