Could Artificial Intelligence undermine constructive disagreement?

Artificial intelligence is no longer a futuristic concept. It’s already reshaping education and research. Recent studies suggest that large language models (LLMs), trained on vast internet datasets, tend to default in their outputs to prevailing Western political and cultural perspectives (see here, here, and here).

But what happens when users push back against LLMs’ default responses?

While LLMs are shaped by the patterns contained in their training data, they are also further refined by their developers to align with human preferences. AI companies, seeking widespread adoption and a return on their substantial investments, have powerful incentives to design systems that maximize user satisfaction and/or retention.

This creates a dynamic partially reminiscent of social media platforms, where algorithms optimize for engagement by showing users content that either confirms their views or caricatures their opponents. The result is a proliferation of filter bubbles, clickbait, outrage, and flattering content at the expense of substance and rigor.

Similarly, AI systems engineered to please may prioritize affirming user beliefs, avoiding disagreement and sidestepping challenges to users’ views. As competition intensifies among AI labs, developers may feel compelled to prioritize engagement metrics and market share over epistemic integrity and ethical safeguards.

To investigate this dynamic, I conducted an informal experiment simulating 2,400 conversations between leading AI models (OpenAI’s GPTs, Google’s Geminis, Anthropic’s Claudes, xAI’s Groks, and others) and simulated human users on everyday topics such as travel, mental health, relationships, and writing advice. After each AI’s initial response to a typical prompt, the simulated user would push back or challenge the response. The AI would then reply to this challenge. I used another LLM (gpt-4.1-mini) to assess whether the AI’s follow-up response displayed hints of flattery.

An interesting pattern emerged: when confronted with user pushback, AI systems often responded obsequiously, frequently praising the user’s “insightful” comments. This stands in stark contrast to most human conversations, where disagreement often elicits debate or indifference rather than default praise.

These dynamics may have subtle psychological consequences. If people become accustomed to constant validation from AI and if AI-human interactions continue to increase, humans may begin to prefer AI conversations over real-life exchanges, which are often more challenging and less affirming. Over time, this could diminish tolerance for disagreement and erode the capacity for honest, robust dialogue. AI could also inadvertently train humans to behave as it does, teaching users to respond with automatic polite praise rather than substance.

The degree of flattery in the models’ comebacks when confronted with user pushback varied markedly between different models. In some systems, it appeared in around 10% of responses; in others, it exceeded 50%. Notably, newer models, such as GPT-4.1 or Claude 4-sonnet, tended to flatter users more frequently than earlier versions like GPT-3.5, or Claude 3.5-sonnet. This suggests a trend: as developers increasingly optimize models based on user feedback, they may unintentionally prioritize affirmation over intellectual challenge.

Could Artificial Intelligence undermine constructive disagreement?

Get HxA In Your Inbox

Related Articles