Source Year Key Finding Cited In
Vaswani et al., “Attention Is All You Need” 2017 Transformer architecture — self-attention mechanism with n-squared pairwise relationships; foundational to context window constraints P2
Zamfirescu-Pereira et al. (CHI), “Why Johnny Can’t Prompt” 2023 Non-experts prefer “Do not X” framing; positive + negative constraints together are strongest prompt structure P5
Hong et al., MetaGPT 2023 Structured artifacts reduce errors ~40% vs. free dialogue in multi-agent systems P4, P7
Liu et al., “Lost in the Middle” 2024 30%+ accuracy drop when critical information is placed in mid-context positions P2, P10
Ranjan et al., “One Word Is Not Enough” 2024 LLM vocabulary acts as a routing signal in embedding space, activating domain-specific knowledge clusters; superlatives and flattery (“world’s best”) route to motivational/marketing clusters rather than domain expertise P6, P10
PRISM Persona Framework 2026 Accuracy damage from personas scales with length — shorter identities cause less degradation; identities should be the minimum length required, under 50 tokens in practice; alignment-accuracy tradeoff: personas improve instruction-following while degrading factual accuracy on knowledge tasks P6, P8, P10
MAST, “Why Do Multi-Agent LLM Systems Fail?” 2024–2025 14 failure modes catalogued across communication (4), coordination (5), and quality (5) categories; rubber-stamp approval as #1 quality failure P7, P8, P10
Captain Agent, “Adaptive In-Conversation Team Building” 2024 Adaptive team composition outperforms static composition by 15–25% across benchmarks P9
Du et al., “Improving Factuality and Reasoning through Multiagent Debate” 2024 Multi-agent debate improves reasoning accuracy on structured problems with verifiable answers P6
LangChain few-shot prompting research 2024 3 well-chosen examples match 9 in effectiveness; diminishing returns are real for few-shot prompting P3, P10
Anthropic, “Building Effective Agents” Dec 2024 Agent vs. workflow distinction; structured handoffs as default pattern P7
Chroma Research, “Context Rot” 2025 Degradation of recall as context length increases; retrieval quality as a function of context window fill P2
Wu et al. (MIT), “On the Emergence of Position Bias in Transformers” 2025 Causal masking and RoPE as architectural causes of U-shaped attention curve — not patchable by prompting P2
DeepMind et al., “Towards a Science of Scaling Agent Systems” 2025 45% threshold for single-agent sufficiency; effectiveness saturates at 3–4 agents; 5-agent team = 7x cost for 3.1x output P9, P10
He et al., “Does Prompt Formatting Have Any Impact on LLM Performance?” 2025 Prompt structure accounts for up to 40% of performance variance independent of content P3, P4
Anthropic, “Effective Context Engineering for AI Agents” Sep 2025 Attention budget concept, progressive disclosure as a strategy, context as finite resource with real cost P2, P9
Anthropic, Skill Creator guidance 2025 “Explain why things are important in lieu of heavy-handed MUSTs” — BECAUSE clauses outperform imperatives P5, P10
Anthropic, “Harness Design for Long-Running Application Development” Mar 2026 Self-evaluation fails — a generator shares its evaluator’s biases; separation of generation from evaluation dramatically improves output quality P6, P10
Vaarta Analytics, “Prompt Engineering Is System Design” 2026 Structured atomic checks reduce false negatives; at n=19 requirements, accuracy drops below n=5 P5, P10