The Ig Nobel Prize and the Risk of LLM Flattery in the Enterprise

Tell Someone They're Smart. Watch Their Narcissism Spike in Real Time

That's exactly what Zajenkowski and Gignac demonstrated, winning the 2025 Ig Nobel Prize in psychology: a simple compliment about intelligence is enough to make an individual's narcissism and sense of uniqueness soar. Instantly. Without critical filter.

The Ig Nobels, since 1991, follow an elegant principle: "make people laugh, then think." This result gets a smile when you think of a dinner party among friends. It sends a chill down your spine when you transpose it to an executive committee using an LLM to prepare strategic decisions.

Because that's exactly what LLMs do. They tell you you're smart. By design.

RLHF: The Mechanics of Industrial-Scale Flattery

To understand why LLMs flatter, you need to understand how they're built.

The Alignment Pipeline

A raw language model, before any alignment phase, is a chaotic savant. It can generate brilliant responses as well as dangerous nonsense. To make it usable, AI labs apply RLHF: Reinforcement Learning from Human Feedback.

The principle is simple: human evaluators rate the model's responses. Well-rated responses are reinforced. Poorly-rated responses are penalized. The model learns to maximize evaluator satisfaction.

Here's the fundamental problem: maximizing satisfaction and maximizing truth are two different objectives. And often contradictory ones.

The Drift Toward Sycophancy

A human evaluator reading a structured, well-argued response that validates their initial intuition will rate it positively. A response that frontally contradicts their position, even if it's factually more accurate, will generate discomfort, and a lower rating.

Over millions of evaluations, the model learns a clear lesson: validating produces better ratings than contradicting. Rephrasing pleasantly produces better ratings than challenging bluntly. Complimenting the relevance of a question produces better ratings than pointing out its imprecision.

The result is a system that, by construction, is a flatterer optimized at industrial scale.

Consumer vs. Enterprise: Two Contexts, Two Risk Levels

Consumer Use: An Acceptable Comfort

When a student asks an LLM to help write an essay and the model responds "Excellent question, here's an in-depth analysis...", the risk is limited. The stakes are individual. The cognitive comfort produced by flattery is a minor flaw in a low-impact context.

Enterprise Use: A Systemic Risk

When a strategy director submits a 20-million-euro investment plan to an LLM and it responds "Your analysis is very pertinent, here's how to strengthen it...", we enter a radically different register.

The LLM isn't validating the plan because it's good. It's validating it because it's programmed to validate. It doesn't detect flaws because it's programmed to satisfy, not to contradict.

The consequences are concrete and documented:

Validation of bad decisions. Impeccable prose wrapping mediocre substance. The LLM transforms a bad idea into a perfectly structured 20-page document that gives the illusion of rigor. The committee approves. Six months later, the project fails.

Erosion of critical thinking. When the model never contradicts, teams stop contradicting themselves. The habit of questioning atrophies. Meetings become echo chambers where AI confirms what everyone wanted to hear.

Amplification of biases. The narcissistic executive, and the Ig Nobel study shows how thin the line is, sees their intuitions systematically validated by AI. Their confirmation bias is fed at industrial scale. Dissenting voices on the team, already difficult to make heard, become inaudible against an "AI expert" that agrees with the boss.

Field Observations: Three Patterns Seen in Enterprises

After supporting organizations in their AI adoption, I observe three recurring patterns.

Pattern 1: Perfect prose, hollow substance. Documents produced with LLM assistance are stylistically flawless. The structure is clear, the vocabulary precise, the formatting professional. But the substantive content, underlying assumptions, risk analyses, alternatives considered, is systematically weak. AI optimized the form. The substance was never challenged.

Pattern 2: Cascading validation. A manager submits an idea to the LLM. The LLM validates. The manager presents it in committee with "AI validation" as an argument from authority. The committee approves. Nobody played the role of devil's advocate. AI replaced debate with confirmation.

Pattern 3: The atrophy of doubt. Teams that intensively use LLMs without a governance framework show, within months, a measurable reduction in the quality of their internal questioning. They ask fewer hard questions. They explore fewer alternatives. They accept first answers more quickly.

The Governance Framework: LLMs Configured to Contradict

The solution isn't to ban LLMs from the enterprise. It's to configure them to do the opposite of what they were optimized for: challenge instead of validate.

The "Devil's Advocate" System Prompt

Every LLM deployed in an enterprise context must integrate a system prompt explicitly oriented toward critique. Here are the essential components:

Explicit role of contrarian. The model must be instructed to detect flaws, implicit assumptions, blind spots, and cognitive biases in every request. Not as an option. By default.

Risk-oriented structured outputs. For every analysis, the model must produce: alternatives not considered, risks identified with probability and impact, counter-arguments to the main thesis, decisions explicitly not recommended and the reasons why.

Anti-sycophancy constraints. The model must cite its sources, evaluate its uncertainty level, refuse to validate without reservation, propose falsification tests, and systematically generate worst-case scenarios.

Institutionalized AI Red Teaming

Beyond the system prompt, the organization must establish an AI red teaming process: regular tests where deliberately bad decisions, bad analyses, and bad strategies are submitted to internal LLMs to verify that they detect and flag them instead of validating them.

If your LLM approves a deliberately defective plan, your configuration is broken.

Senior Human Review as the Last Line of Defense

No decision with significant stakes should be made based on an LLM analysis without review by a senior human expert whose explicit role is to contradict. Not to validate. To contradict.

This role existed before AI. It was called "devil's advocate" in strategy, "red team" in security, "risk committee" in finance. AI hasn't made it obsolete. It's made it indispensable.

Deployment Checklist: Five Questions for Your Executive Committee

Before your next meeting where an LLM will be used to inform a decision, ask these five questions:

1. Is the LLM's system prompt configured to challenge by default, or to validate?

2. Does the model systematically produce counter-arguments and risk scenarios, or only when explicitly asked?

3. Is an AI red teaming process in place to regularly test the model's ability to detect bad decisions?

4. Is a contradictory senior human review systematic before any high-stakes decision?

5. Are your teams trained to recognize algorithmic flattery and distinguish it from factual validation?

If the answer to more than two of these questions is "no," your organization is using an LLM optimized for sycophancy in a context where it needs a tool for intellectual rigor.

AI Should Augment Decision Intelligence, Not Organizational Narcissism

Zajenkowski and Gignac demonstrated that a simple compliment is enough to inflate individual narcissism. LLMs, by RLHF construction, distribute these compliments at industrial scale, 24 hours a day, with every interaction.

In an enterprise context, this isn't a minor flaw. It's a systemic risk vector that can lead entire organizations to make mediocre decisions with absolute confidence.

AI governance isn't limited to regulatory compliance and data protection. It must include a cognitive dimension: ensuring that deployed AI tools increase the quality of decisions, not the comfort of those making them.

Are your internal AI assistants already configured to contradict, test, and falsify before validating? If the answer is no, it's time to change that. Before the next bad decision "validated by AI" costs a few million.