Easily Swayed

Sycophantic Models, Shifting Stances, and What Agents Forget

Jun 06, 2026

📬 Distill AI delivers the most important AI papers worth your time, tailored to your interests and field, every morning. 💬 Chat with any of them, instantly!

This edition’s papers circle one question: how far can you trust what a model tells you? The lead finding is uncomfortable — bigger models know the right answer better, then fold faster when a user pushes back. Around it: models that change their story when you tweak unrelated context, and a close look at what agents actually remember across long tasks. On the practical side, there’s protein design, an open recipe for training search agents, two new multimodal models, and a few tools for landing an AI job.

📰 THE QUICK BRIEF

• OpenAI: leaked internal benchmarks circulating this week point to an imminent GPT-5.6, with the expected gains in multi-step agentic reasoning and token efficiency.
• Google rolled out Managed Agents in the Gemini API — a push toward a more proactive, “agentic” Gemini that carries out multi-step tasks for you instead of just answering.
• Arizona’s largest utility proposed a 45% electricity-rate increase for data centers so they “pay their fair share” — a blunt reminder that the AI boom runs on a power grid someone has to fund.

⭐ TODAY’S HIGHLIGHT

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

Sycophancy is when a model drops the correct answer because you pushed back or hinted you wanted a different one. This paper splits that into two things you can actually measure: how strongly a model leans toward the truth to begin with (its “truth margin”), and how easily social pressure knocks it off (its “manipulation sensitivity”).

The uncomfortable result: bigger models start out more sure of the right answer — and are also more likely to cave when a user leans on them. Instruction tuning pulls on both levers, for better and worse.

Why it matters: as we hand models higher-stakes questions, knowing the right answer isn’t enough — they have to hold the line when someone pushes. This gives a concrete way to test whether they do.

📄 MORE PAPERS WORTH READING

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

Researchers increasingly use LLMs to role-play how real people would react in online debates. This audit shows that’s shaky ground: change a part of the thread that has nothing to do with someone’s actual view, and the model’s predicted stance moves anyway. A good reason to be careful with “simulated users.”

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

A systems-level look at how agents remember across long jobs. It compares the usual approaches — dump everything into retrieval, have an LLM pull out and store key facts, or consolidate those facts over time — and measures what each one costs as a session drags on. Worth a read if you’re building agents meant to run for hours, not seconds.

Distill AI

💻 ON GITHUB

DISCO-design/DISCO

A Python toolkit for designing proteins across multiple modalities, using a DNA-encoding-of-chemistry trick to represent molecules. A real machine-learning-for-biology entry, built for general protein engineering.

shawn0728/OpenSearch-VL

An open, end-to-end recipe for training multimodal search agents — the curated data, the visual and search tools, and a reinforcement-learning method (they call it “fatal-aware”) for teaching an agent to search without going off the rails.

wanshuiyin/ARIS-in-AI-Offer

AI job-hunt kit, part one: bilingual ML/LLM/diffusion interview cheat sheets that compile into a single HTML file, plus a tool that turns your CV into an academic homepage and fact-checks it against DBLP.

couragec/LLMInternSkill

AI job-hunt kit, part two: a toolkit aimed at LLM internship applications — polishing your resume, tailoring it to a job description, validating the claims on your CV, interview prep, and finding projects to build.

caojiaolong/spaces-index

A small automation that scrapes and indexes the well-known Chinese ML blog 科学空间 (Scientific Spaces), sorting its posts by topic with links back to the originals — handy if you mine that blog for ideas.

Distill AI

🤗 ON HUGGING FACE

deepseek-ai/Janus-Pro-7B — multimodal / any-to-any.

DeepSeek’s 7-billion-parameter multimodal model that both understands and generates across text and images in one unified architecture — small enough to run yourself, and a tidy example of “any-to-any” generation.

zai-org/GLM-OCR — OCR / documents.

A model from Z.ai that reads text out of images and turns it into clean output, pairing classic OCR with a language model’s understanding for messier, real-world documents.

Distill AI

⏰ DEADLINES CLOSING SOON

• ICDM 2026 — IEEE International Conference on Data Mining — Shenyang, China — paper deadline June 6, 2026
• PRICAI 2026 — Pacific Rim International Conference on AI — Shenyang, China — full-paper deadline June 7, 2026
• ACML 2026 — Asian Conference on Machine Learning — Melbourne, Australia — paper deadline June 27, 2026

Distill AI

Distill AI

Discussion about this post

Ready for more?