Research Citation Index

Author(s)	Year	Key Finding	Cited In
Liu et al.	2024	“Lost in the Middle” — 30%+ accuracy drop for information placed in mid-context positions	P2, P10
Wu et al. (MIT)	2025	U-shaped attention curve caused by causal masking and RoPE — architectural, not patchable by prompting	P2
Hong et al.	2023	MetaGPT structured artifacts reduce errors ~40% vs. free dialogue in multi-agent systems	P4, P7
Voyce	2025	Prompt structure accounts for up to 40% of performance variance independent of content	P3, P4
Ranjan et al.	2024	LLM vocabulary acts as a routing signal in embedding space, activating domain-specific clusters	P6, P10
Zamfirescu-Pereira et al. (CHI)	2023	“Why Johnny Can’t Prompt” — positive example + negative example + reason is the strongest prompt structure	P5
LangChain	2024	3 well-chosen examples match 9 in effectiveness for few-shot prompting	P3
PRISM	2024	Brief personas (<50 tokens) outperform elaborate ones; flattery actively degrades output quality	P6, P8, P10
DeepMind	2025	Multi-agent scaling: effectiveness saturates at 3–4 agents; 7+ agents degrades below 4-agent performance	P9
Captain Agent	2024	Adaptive team composition outperforms static composition by 15–25% across benchmarks	P9
MAST Framework	2024–2025	14 failure modes catalogued across communication (4), coordination (5), and quality (5) categories	P7, P8
Anthropic	2026	Self-evaluation fails — a generator shares its evaluator’s biases, requiring separate reviewer agents	P6
Anthropic Skill Creator	2025	“Explain why things are important in lieu of heavy-handed MUSTs” — BECAUSE clauses outperform imperatives	P5, P10
Vaarta Analytics	2026	Structured atomic checks reduce false negatives vs. free-text review at scale	P10
Rasheed et al.	2024	“Can LLMs Replace Data Scientists?” — a single expert agent outperforms panels for analytics tasks	P9
Du et al.	2024	Multi-agent debate improves reasoning accuracy on structured problems with verifiable answers	P6
Anthropic Harness Design	2026	Separation of generator and evaluator prevents shared-bias failures in quality assurance workflows	P6