Research Citation Index

Source	Year	Key Finding	Cited In
Vaswani et al., “Attention Is All You Need”	2017	Transformer architecture — self-attention mechanism with n-squared pairwise relationships; foundational to context window constraints	P2
Zamfirescu-Pereira et al. (CHI), “Why Johnny Can’t Prompt”	2023	Non-experts prefer “Do not X” framing; positive + negative constraints together are strongest prompt structure	P5
Hong et al., MetaGPT	2023	Structured artifacts reduce errors ~40% vs. free dialogue in multi-agent systems	P4, P7
Liu et al., “Lost in the Middle”	2024	30%+ accuracy drop when critical information is placed in mid-context positions	P2, P10
Ranjan et al., “One Word Is Not Enough”	2024	LLM vocabulary acts as a routing signal in embedding space, activating domain-specific knowledge clusters; superlatives and flattery (“world’s best”) route to motivational/marketing clusters rather than domain expertise	P6, P10
PRISM Persona Framework	2026	Accuracy damage from personas scales with length — shorter identities cause less degradation; identities should be the minimum length required, under 50 tokens in practice; alignment-accuracy tradeoff: personas improve instruction-following while degrading factual accuracy on knowledge tasks	P6, P8, P10
MAST, “Why Do Multi-Agent LLM Systems Fail?”	2024–2025	14 failure modes catalogued across communication (4), coordination (5), and quality (5) categories; rubber-stamp approval as #1 quality failure	P7, P8, P10
Captain Agent, “Adaptive In-Conversation Team Building”	2024	Adaptive team composition outperforms static composition by 15–25% across benchmarks	P9
Du et al., “Improving Factuality and Reasoning through Multiagent Debate”	2024	Multi-agent debate improves reasoning accuracy on structured problems with verifiable answers	P6
LangChain few-shot prompting research	2024	3 well-chosen examples match 9 in effectiveness; diminishing returns are real for few-shot prompting	P3, P10
Anthropic, “Building Effective Agents”	Dec 2024	Agent vs. workflow distinction; structured handoffs as default pattern	P7
Chroma Research, “Context Rot”	2025	Degradation of recall as context length increases; retrieval quality as a function of context window fill	P2
Wu et al. (MIT), “On the Emergence of Position Bias in Transformers”	2025	Causal masking and RoPE as architectural causes of U-shaped attention curve — not patchable by prompting	P2
DeepMind et al., “Towards a Science of Scaling Agent Systems”	2025	45% threshold for single-agent sufficiency; effectiveness saturates at 3–4 agents; 5-agent team = 7x cost for 3.1x output	P9, P10
He et al., “Does Prompt Formatting Have Any Impact on LLM Performance?”	2025	Prompt structure accounts for up to 40% of performance variance independent of content	P3, P4
Anthropic, “Effective Context Engineering for AI Agents”	Sep 2025	Attention budget concept, progressive disclosure as a strategy, context as finite resource with real cost	P2, P9
Anthropic, Skill Creator guidance	2025	“Explain why things are important in lieu of heavy-handed MUSTs” — BECAUSE clauses outperform imperatives	P5, P10
Anthropic, “Harness Design for Long-Running Application Development”	Mar 2026	Self-evaluation fails — a generator shares its evaluator’s biases; separation of generation from evaluation dramatically improves output quality	P6, P10
Vaarta Analytics, “Prompt Engineering Is System Design”	2026	Structured atomic checks reduce false negatives; at n=19 requirements, accuracy drops below n=5	P5, P10