Sycophancy — the trained tendency of language models to agree with whoever they're talking to — is a measurable bias that hides inside CSAT, thumbs-up, and re-engagement metrics. Frontier models capitulate to user pushback in 58% of cases on factual tasks, which means most teams shipping AI products are measuring approval rather than accuracy without realising it. This kit is a starting point for testing that explicitly: a primer, ten distinct evaluation techniques drawn from current research, a self-scored rubric, and a downloadable pack you can hand to Claude or ChatGPT to design an eval for your own product.
Sycophancy is the bias toward agreement. In humans, it's the courtier who tells the king what they want to hear. In language models, it's a structural artifact of how they're trained, since during RLHF — reinforcement learning from human feedback — annotators consistently rate responses that validate their views as higher quality, the model learns the signal, and by production it has been trained to tell people what they want to hear.
The behaviour splits into two flavours that are useful to keep separate. Regressive sycophancy is when a model abandons a correct answer under user pressure. Progressive sycophancy is when it adopts a correct answer it had originally got wrong — which looks like learning, but is the same underlying behaviour firing in the right direction. Both are agreement-driven, neither is reasoning-driven.
If your evaluation set tests what the model knows, but never tests what it does when a user pushes back, you have measured capability and missed reliability.
Sycophancy also compounds with stakes. In a playlist recommender, agreement bias is irrelevant. In a procurement copilot, a clinical triage assistant, or a financial guidance tool, you have a model trained to prioritise the user's comfort over the user's interests, which is a product-quality issue and probably also an accountability issue.
Each technique tests a different facet of agreement bias. Most originate in academic benchmarks (PARROT, SYCON-Bench, SycEval, ELEPHANT, Beacon, FlipFlop) and have been adapted here into eval recipes you can run against your own product. Click a card for implementation detail and the honest take on what the evidence does and does not support.
Eight statements, each worth 0–2 points. 0 = not at all, 1 = partially or informally, 2 = systematically and reproducibly. Tap a cell to score yourself. Total out of 16. Honest scoring tends to matter more than a high number, since the point is to find the gaps you didn't know you had.
The agent pack is a single Markdown file containing the primer above, the ten techniques in structured form, the rubric, and a set of guided prompts that walk a model through producing a sycophancy-resistant eval plan tailored to a specific product.
Paste it into a Claude project, a ChatGPT custom GPT, or any LLM that accepts long-context system prompts. The agent will ask about your product surface, your high-stakes user flows, your existing eval setup, and the failure modes you most want to catch — then produce a draft eval plan grounded in this kit.