Propaganda is Already Influencing Large Language Models: Evidence From Training Data, Audits, and Real-world Usage

Webinar

18.12.2025

14.15 - 15.45 Uhr

Online

Propaganda is Already Influencing Large Language Models: Evidence From Training Data, Audits, and Real-world Usage

There has been a flurry of recent concern about the question of who directly controls large language models. We show through six studies that coordinated propaganda from powerful global political institutions already indirectly influences the output of U.S. large language models (LLMs) via their training data, a pattern which is easiest to see in China. First, we demonstrate that material originating from China's Publicity Department appears in large quantities in open-source pre-training datasets. Second, we connect this to U.S.-based commercial LLMs by showing that they have memorized sequences of propaganda, suggesting that it does appear in their training data. Third, we use an open-weight LLM to show that additional pre-training on Chinese state propaganda generates more positive answers to prompts about Chinese political institutions and leaders—-evidence that propaganda itself, not mere differences in culture and language, can be a causal factor in the behavioral differences we observe across languages. Fourth, we show that prompting commercial models in Chinese generates more positive responses about China's institutions and leaders than the same queries in English. Fifth, we show that this language difference holds in prompts of actual Chinese-speaking users. Sixth, we extend our findings with a cross-national study that indicates that the languages of countries with lower media freedom show a stronger pro-regime valence than those with higher media freedom. Finally, we show results that demonstrate that the phenomenon described here is broader than propaganda and state media alone. Our findings join the ample recent work demonstrating the persuasive power of LLMs. Together, these results suggest the troubling conclusion that states and powerful institutions will have increased strategic incentives to disseminate propaganda in the hopes of poisoning LLM training data.

Eddie Yang is an Assistant Professor of Political Science and faculty member in the Institute for Physical Artificial Intelligence at Purdue University. He received his Ph.D. in political science from the University of California San Diego. Yang studies the politics of innovation and technology. His research has been published at the Proceedings of the National Academy of Sciences, and Political Analysis, among other outlets.

——

Lecture Series: AI Governance in China

Artificial intelligence is rapidly transforming societies, economies, and political systems worldwide, and China is emerging as a central actor in shaping the governance of these technologies. This lecture series explores the multiple dimensions of AI governance in China, from state regulation and the role of AI in public administration to societal engagement with AI technologies.

Join us online and in person for six lectures featuring leading scholars, including: Jinghan Zeng (City University of Hong Kong), Hui Zhou and Genia Kostka (Freie Universität Berlin) and Angela Huyue Zhang (USC Gould School of Law), Eddie Yang (Purdue University), David Yang (Harvard University) and Jeffrey Ding (George Washington University).

Hosted by the Berlin Contemporary China Network (BCCN), the China Competence Training Center (CCTC) and SCRIPTS, the 2025/26 winter term lecture series is conceptualized by Prof. Dr. Genia Kostka and Anton Bogs from Freie Universität Berlin.

Anmelden