AI · LLM

LLM Structured Output: JSON Mode vs Function Calling vs Constrained Decoding

Comparing three approaches to getting reliable JSON from LLMs, with practical guidance on which to choose for production in 2026.

Technical information was last verified on April 2026. The AI/LLM field moves fast — re-check official docs if more than 6 months have passed.

Who should read this

Summary: There are three ways to get structured data from an LLM. Structured Output (constrained decoding) guarantees 100% schema compliance. JSON Mode guarantees valid JSON but not schema compliance. Function Calling is designed for tool invocation. In 2026, Structured Output is supported by every major provider, making it the right answer for most cases.

This article is for backend developers who need to receive programmable data from LLM API calls.

Comparing the 3 approaches

JSON ModeFunction CallingStructured Output
JSON syntax guarantee GuaranteedGuaranteedGuaranteed
JSON Schema compliance Not guaranteedNot guaranteed (without strict)100% guaranteed
Primary use case Simple JSON extractionAgent tool invocationSchema-based data extraction
Overhead Near zeroSlight (tool definition tokens)Near zero (XGrammar)
OpenAI support GPT-3.5+GPT-4+GPT-4o, 4.1 (strict)
Anthropic support --Tool useClaude 3.5+ (Nov 2025-)
Open-source support vLLM, TGILimitedvLLM + XGrammar
As of April 2026. When Structured Output is available, always choose it.

JSON Mode — simplest but incomplete

Setting response_format: { type: "json_object" } makes the LLM return syntactically valid JSON. However, you cannot specify a schema. You might expect { "name": "Jane Doe" } but get { "user": "Jane Doe", "extra": true } instead.

Using JSON Mode alone in production means your response parsing code needs defensive logic, and schema mismatches cause runtime errors. Suitable for quick prototyping only.

Function Calling — agent tool invocation

Function Calling is a mechanism where “the LLM decides to call an external function.” Its purpose is action selection, not data extraction.

function-call.ts TypeScript
// OpenAI Function Calling
const response = await openai.chat.completions.create({
  model: 'gpt-4.1',
  messages: [{ role: 'user', content: 'Tell me the weather in Seoul' }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      parameters: {
        type: 'object',
        properties: { city: { type: 'string' } },
        required: ['city'],
      },
    },
  }],
});
// → model decides to call get_weather({ city: "Seoul" })

Adding strict: true guarantees schema compliance, but at that point it is effectively the same mechanism as Structured Output.

Structured Output — the 2026 production standard

Constrained Decoding: When the LLM generates tokens, it sets the probability of any token that violates the JSON Schema to zero, making schema violations structurally impossible. This is not “validate then retry” but “invalid tokens can never be selected in the first place.”

structured-output.ts TypeScript
// OpenAI Structured Output
const response = await openai.chat.completions.create({
  model: 'gpt-4.1',
  messages: [{ role: 'user', content: 'Extract sentiment and keywords from this review' }],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'review_analysis',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
          keywords: { type: 'array', items: { type: 'string' } },
          confidence: { type: 'number', minimum: 0, maximum: 1 },
        },
        required: ['sentiment', 'keywords', 'confidence'],
      },
    },
  },
});
// → 100% schema compliance guaranteed

Engines like XGrammar and llguidance have reduced the performance overhead of constrained decoding to near zero. As of 2026, there is no reason not to use Structured Output in production.

What to avoid

Further reading