Context Engineering on 卓琪的开发笔记

Claude's Tool Calling Paradigm Shift: A Deep Dive into Programmatic Tool Calling and Dynamic Filtering

Sat, 13 Jun 2026 00:00:00 +0000

Background: The Cost Problem in Agent Tool Calling
#

In traditional agent tool-calling, every tool invocation requires a full cycle of “model inference → tool execution → result return → model re-inference.” This seemingly natural loop breaks down at scale in three ways:

Context Pollution: Every tool result is injected verbatim into the context window. Fetch expense reports for 20 employees, and 2,000+ line items enter context — even though you only need to know “which 3 people exceeded their budget.”
Inference Overhead: Each tool call demands a full model inference pass. Five tools = five inference passes, each costing hundreds of milliseconds to seconds.
Noise Degrades Accuracy: When the context window is packed with intermediate results, the model must find signal in noise. Context Rot research shows LLM performance on complex tasks drops 50-70% as context grows.

As Florian Bruniaux puts it in the Claude Code Architecture Guide: “The Outer Loop — everything outside the model: context management, tool invocation, verification, memory consolidation — increasingly determines system quality more than model inference itself.”

RAG vs LLM Wiki vs Plain Text — A Decision Framework for Agent Long-Term Memory

Mon, 11 May 2026 00:00:00 +0000

Every Agent builder hits this question eventually: where do I store user data so the agent remembers it next session?

Three approaches dominate the landscape: RAG (vector retrieval), LLM Wiki (structured knowledge injection), and plain-text context memory (the CLAUDE.md / Cursor Rules pattern). Each has vocal advocates. But picking wrong is expensive — do RAG too light and it’s a noise generator; do plain text too heavy and it’s a token incinerator.

Why LLMs Have No Memory — A Research Report Covering 67 Primary Sources

Mon, 04 May 2026 00:00:00 +0000

This is not AI科普. This is a cross-validated research sprint backed by 67 primary sources — vendor docs, arXiv papers, and researcher interviews — on a question every Agent builder hits: why don’t LLMs remember anything?

→ Full report: 14-product comparison table, 9 engineering takeaways, 3-year paradigm roadmap

The One-Liner
#

Four independent constraints — O(n²) attention + KV cache VRAM + catastrophic forgetting + GDPR right-to-be-forgotten — stacked together leave “stateless” as the only viable engineering solution. Every “Memory” feature you’ve seen (ChatGPT, Claude, Cursor) is structured text injected into the system prompt. Zero weight modification. The next 1–3 years belong to stateless LLM kernels + stateful Agent memory layers.