How to talk to AIs: Advanced Context Engineering

Spencer Ames

November 20, 2025

A collage that merges circuit board patterns with textile motifs in a grid-like background of alternating black, grey, and white. Two hand-drawn arms are on each side of the image, positioned as if gently pulling on thin, white strings that cross the image diagonally. The hands appear soft and somewhat translucent, contrasting with the rigid lines of the circuit board patterns behind them. The strings are woven through both the hands and the background, symbolising the connection between traditional weaving — Textiles and Tech 1 by Hanna Barakat & Archival Images of AI + AIxDESIGN

In Context Engineering 101, we covered five core principles: explicit goals, minimal context, clear labeling, smart examples, and short memory. This guide extends those into structured data, retrieval, long-horizon reasoning, and evaluation.

These articles aim to reflect where the field is going: designing the entire context state each turn, not just the instruction string, and treating tokens like a scarce resource with diminishing returns.

Structured outputs & constrained decoding

Plain “please return JSON” helps, but you’ll get higher reliability by guaranteeing shape.

Why it matters: Your downstream graders, dashboards, or scripts won’t break on malformed outputs. For policy/syllabus Q&A, this also encourages short, sourced answers consistent with 101’s “be explicit about goal and format”

When to force a tool call vs. free text

Require a tool call when correctness depends on structure (e.g., function args, SQL, JSON) or when you need to prevent free-text improvisation.
Use response prefill or “required function call” modes to constrain decoding without touching tool definitions.

Retrieval that actually holds up

Good RAG is just-in-time: provide the smallest set of high-signal tokens that makes the next step likely to succeed

Six-step loop:

Chunk & title sources into coherent snippets with stable IDs.
Query expansion: generate 3–5 variants, then search.
Rerank and keep the top 2–6 snippets.
Pack abstracts + IDs, not bodies (title + 2–3 sentence abstract + ID).
Require faithful citing: answer must reference supplied IDs; otherwise return “insufficient evidence.”
On demand: if the model asks for a specific ID, fetch that chunk and re-ask with the new snippet.

Long-horizon patterns (when tasks exceed one window)

When work spans tens of minutes or longer (literature synthesis, governance reports), use these:

Compaction: Summarize decisions, open todos, and key refs; restart with the compact state + last few artifacts.
Structured notes: Keep a separate note or small JSON state (objectives, open questions, key file IDs). Recite the essentials at the tail of context every turn.
Focused sub-agents: Delegate deep work to specialized workers if available.

KV-cache playbook (cost and latency)

High cache hit rate = faster loops and lower spend.

Keep the system prefix stable; avoid per-turn timestamps at the top.
Treat context as append-only; don’t edit prior turns.
Serialize deterministically (stable key ordering for tools/JSON).
Prefer masking tools over adding/removing them mid-run. If you must break cache, do it intentionally

Tooling hygiene

One purpose per tool with clear names/params.
Keep the action space stable; mask invalid actions instead of deleting tools.
Use prefill/required-call modes to force selection when needed.

Why it matters: Prevents schema drift and reduces hallucinated actions. It’s the tooling equivalent of keeping a syllabus template consistent

Minimal evals you’ll actually run

Don’t ship blind; build simple evals you can maintain.

Metrics: accuracy, faithfulness (cites supplied IDs), usefulness (Likert 1–5), token cost, latency, retries.

Protocol: Compare your context variants (baseline vs. new packing). Log outputs and costs. Promote only if accuracy and cost improve

Academic lens: Favor traceability (where did this answer come from?) and interpretability alongside accuracy/cost

Putting it together (pattern recap)

Stable system prompt with a small schema and a citation rule.
Minimal abstracts + IDs from retrieval.
One varied example to teach the output shape.
Small notes block (objectives, open questions) recited into the tail.
Mask tools by step; avoid dynamic adds/removes.
Append-only transcript; deterministic serialization for cache.
Eval each change against your gold set; promote winners

Advanced context work doesn’t have to be complex. Start with one workflow (e.g., syllabus Q&A), add the small schema and “insufficient evidence” rule, switch your retrieval to abstracts + IDs, keep a tiny NOTES.md, and measure with a page-sized gold set. These steps reflect the core ideas in the 101 guide and current best practices from Anthropic and DAIR.AI—design the whole context each turn, and keep it tight

How to talk to AIs: Advanced Context Engineering

Anthropic – Effective Context Engineering (concepts and strategy)

DAIR.AI – Context Engineering Guide (definitions, structure, and a concrete walkthrough)

Manus – Lessons from Building Agents (practical patterns that affect reliability, speed, and cost)

AICoP: CAiSEY an AI-powered Course Tool

AICoP: How SPS Is Embedding AI School-Wide

AICoP: OpenAI Codex: Data, Development, and Decision-Making

AICoP - Local Compute, Real-World Impact for AI in Higher Education

Data Privacy and Security for AI Platforms

Contact Us

How to talk to AIs: Advanced Context Engineering

Structured outputs & constrained decoding

Retrieval that actually holds up

Long-horizon patterns (when tasks exceed one window)

KV-cache playbook (cost and latency)

Tooling hygiene

Minimal evals you’ll actually run

Putting it together (pattern recap)

Further reading

Anthropic – Effective Context Engineering (concepts and strategy)

DAIR.AI – Context Engineering Guide (definitions, structure, and a concrete walkthrough)

Manus – Lessons from Building Agents (practical patterns that affect reliability, speed, and cost)

Contact Us