How to talk to AIs: Advanced Context Engineering
In Context Engineering 101, we covered five core principles: explicit goals, minimal context, clear labeling, smart examples, and short memory. This guide extends those into structured data, retrieval, long-horizon reasoning, and evaluation.
These articles aim to reflect where the field is going: designing the entire context state each turn, not just the instruction string, and treating tokens like a scarce resource with diminishing returns.
Structured outputs & constrained decoding
Plain “please return JSON” helps, but you’ll get higher reliability by guaranteeing shape.
Why it matters: Your downstream graders, dashboards, or scripts won’t break on malformed outputs. For policy/syllabus Q&A, this also encourages short, sourced answers consistent with 101’s “be explicit about goal and format”
When to force a tool call vs. free text
- Require a tool call when correctness depends on structure (e.g., function args, SQL, JSON) or when you need to prevent free-text improvisation.
- Use response prefill or “required function call” modes to constrain decoding without touching tool definitions.
Retrieval that actually holds up
Good RAG is just-in-time: provide the smallest set of high-signal tokens that makes the next step likely to succeed
Six-step loop:
- Chunk & title sources into coherent snippets with stable IDs.
- Query expansion: generate 3–5 variants, then search.
- Rerank and keep the top 2–6 snippets.
- Pack abstracts + IDs, not bodies (title + 2–3 sentence abstract + ID).
- Require faithful citing: answer must reference supplied IDs; otherwise return “insufficient evidence.”
- On demand: if the model asks for a specific ID, fetch that chunk and re-ask with the new snippet.
Long-horizon patterns (when tasks exceed one window)
When work spans tens of minutes or longer (literature synthesis, governance reports), use these:
- Compaction: Summarize decisions, open todos, and key refs; restart with the compact state + last few artifacts.
- Structured notes: Keep a separate note or small JSON state (objectives, open questions, key file IDs). Recite the essentials at the tail of context every turn.
- Focused sub-agents: Delegate deep work to specialized workers if available.
KV-cache playbook (cost and latency)
High cache hit rate = faster loops and lower spend.
- Keep the system prefix stable; avoid per-turn timestamps at the top.
- Treat context as append-only; don’t edit prior turns.
- Serialize deterministically (stable key ordering for tools/JSON).
- Prefer masking tools over adding/removing them mid-run. If you must break cache, do it intentionally
Tooling hygiene
- One purpose per tool with clear names/params.
- Keep the action space stable; mask invalid actions instead of deleting tools.
- Use prefill/required-call modes to force selection when needed.
Why it matters: Prevents schema drift and reduces hallucinated actions. It’s the tooling equivalent of keeping a syllabus template consistent
Minimal evals you’ll actually run
Don’t ship blind; build simple evals you can maintain.
Metrics: accuracy, faithfulness (cites supplied IDs), usefulness (Likert 1–5), token cost, latency, retries.
Protocol: Compare your context variants (baseline vs. new packing). Log outputs and costs. Promote only if accuracy and cost improve
Academic lens: Favor traceability (where did this answer come from?) and interpretability alongside accuracy/cost
Putting it together (pattern recap)
- Stable system prompt with a small schema and a citation rule.
- Minimal abstracts + IDs from retrieval.
- One varied example to teach the output shape.
- Small notes block (objectives, open questions) recited into the tail.
- Mask tools by step; avoid dynamic adds/removes.
- Append-only transcript; deterministic serialization for cache.
- Eval each change against your gold set; promote winners
Advanced context work doesn’t have to be complex. Start with one workflow (e.g., syllabus Q&A), add the small schema and “insufficient evidence” rule, switch your retrieval to abstracts + IDs, keep a tiny NOTES.md, and measure with a page-sized gold set. These steps reflect the core ideas in the 101 guide and current best practices from Anthropic and DAIR.AI—design the whole context each turn, and keep it tight
Further reading
Anthropic – Effective Context Engineering (concepts and strategy)
DAIR.AI – Context Engineering Guide (definitions, structure, and a concrete walkthrough)
Manus – Lessons from Building Agents (practical patterns that affect reliability, speed, and cost)