28 June 2026

Getting an LLM to return JSON you can actually trust

The first time you wire an LLM into real code, you hit the same wall everyone does. You ask for JSON, and most of the time you get JSON - and then, occasionally, you get this:

Sure! Here's the data you asked for:
```json
{ "name": "Ada", "role": "engineer" }
```
Let me know if you'd like anything else!

JSON.parse throws, your request 500s, and a user sees a broken page because the model felt chatty. If you're putting an LLM in production, "usually valid" isn't good enough. Here's the stack of techniques I use to make structured output boringly reliable.

1. Never parse hope - validate

The core mistake is treating the model's output as trusted data. It's untrusted input, exactly like a form field from a stranger. So the first rule: define a schema and validate against it.

import { z } from "zod";

const Person = z.object({
  name: z.string().min(1),
  role: z.enum(["engineer", "designer", "founder"]),
  years: z.number().int().nonnegative(),
});

type Person = z.infer<typeof Person>;

Now the model's reply has to earn its way into your types. Anything malformed gets caught at the boundary instead of detonating three functions deep.

2. Use the platform's structured mode

Before you engineer around the problem, check what the API already gives you. Most providers now offer some form of guaranteed structure:

Feature	What it does
JSON mode	Forces output to be syntactically valid JSON
Structured outputs	Constrains output to your schema, not just "JSON"
Tool / function call	Model returns arguments matching a typed signature

Structured outputs and tool-calling are the big ones - they constrain decoding so the model literally can't emit a field you didn't define. Reach for these first; they remove whole categories of failure for free.

3. Turn the temperature down

Sampling temperature controls randomness. For creative writing you want it high. For structured extraction you want it low - often 0 to 0.2. You're not asking the model to be imaginative; you're asking it to fill in a form correctly. Low temperature makes the shape stable and repeatable.

4. Retry - and feed the error back

Even with all of the above, you'll get the occasional miss. Don't just retry blindly; tell the model what broke. The validation error is a perfect correction signal.

async function extract(input: string, tries = 2): Promise<Person> {
  let lastError = "";
  for (let i = 0; i < tries; i++) {
    const raw = await callModel(buildPrompt(input, lastError));
    const parsed = Person.safeParse(safeJson(raw));
    if (parsed.success) return parsed.data;
    lastError = JSON.stringify(parsed.error.issues);
  }
  throw new Error("Model could not produce valid output");
}

On the second pass the prompt now includes "your last answer failed validation with these issues: …, fix them." Models are surprisingly good at correcting themselves when you hand them the exact complaint.

Treat the LLM like an unreliable junior who's brilliant but careless. You don't trust the first draft - you give precise feedback and check the work.

5. Ground it, don't let it invent

A model asked to produce a field it doesn't know will often make one up rather than leave it blank - that's hallucination. Two cheap defenses:

Allow "unknown." Add z.null() or an explicit "unknown" enum value and tell the model to use it when the input doesn't say. Giving it a legal way to say "I don't know" beats forcing a guess.
Keep it extractive. When you can, frame the task as "pull these fields from this text" rather than "tell me about X." Extraction from provided context hallucinates far less than open-ended generation.

The mental model

Reliable LLM output isn't one trick, it's a posture: constrain what you can, validate everything, and build a correction loop for the rest.

constrain (schema/tools) → low temperature → validate → retry with the error → fall back gracefully

None of this is exotic. It's the same defensive engineering you'd apply to any flaky external dependency - which, for now, is exactly what a language model is. Treat it that way and the "magic" becomes just another well-behaved part of your system.

Building something with LLMs and want to compare approaches? Say hi.