Why structure matters more than ever#
In times where prompts are growing larger and AI models are getting more powerful, one question keeps coming up: how can we keep costs and processing times low?
When working programmatically with LLMs, structured outputs have already become a standard approach. You can tell the AI to respond in a certain format, for example JSON. By defining a model schema , and ideally describing what each field means , the AI tries to understand the context and fill in the output “to the best of its knowledge and belief” (or however close an LLM gets to that).
This has made it easier than ever to work with the results of an AI.
But what about the input?
Even though we can make the output neat and structured, most of us still dump huge JSON, YAML, or even plain text datasets into prompts. That’s not only slow and expensive, it’s also far from token-efficient.
So it was only a matter of time until a new format would appear to fix exactly that.
And here comes TOON 🎉
Meet TOON — The token-friendly cousin of JSON#
TOON is a new file format that sits somewhere between JSON and CSV. It’s still human-readable, but optimized for LLMs and tokenization efficiency. According to its creators, it can reduce token counts by 30–60% which, given how token pricing works, can translate into serious cost savings.
Here’s what makes TOON so interesting:
- 💸 Token-efficient: typically 30–60% fewer tokens than JSON
- 🤿 LLM-friendly guardrails: explicit lengths and fields enable validation
- 🍱 Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
- 📐 Indentation-based structure: like YAML, uses whitespace instead of braces
- 🧺 Tabular arrays: declare keys once, stream data as rows
JSON#
| |
TOON#
| |
If you look at it closely, it feels a bit like YAML met CSV for a coffee and decided to raise a structured baby together.
Benchmarks from the creators already show impressive results:
👉 Toon Benchmarking and Key Features
So… why should I care?#
If you’re building anything that regularly feeds structured data into LLMs, for example, chatbots, AI-assisted code generation, or multi-step workflows, TOON can significantly reduce prompt size.
It’s not just about the money (although saving around 50% on token usage isn’t bad at all).
It’s also about speed. Fewer tokens mean faster inference and potentially lower latency, especially in real-time systems or when using streaming APIs.
And the best part: it’s already available for multiple languages (including):
🟦 .NET: ToonSharp
🐍 Python: python-toon / pytoon
🦫 Go: gotoon
Real-world evaluation#
So I’ve built a small benchmark tool to see how TOON actually performs compared to JSON.
Using a simple dataset of employees, I asked GPT to analyze the data and calculate the average salary by department. The tool measures prompt size, completion token count, and overall response time.
A big thanks to Sebastian Jensen for the neat ConsoleHelper class, it made the output look way more professional than I expected from a console app! 😄
Here are the results from my run:
| Type | Prompt Tokens | Completions Tokens | Duration |
|---|---|---|---|
| JSON | 1344 | 3475 | 00:00:28.3932721 |
| TOON | 589 | 2928 | 00:00:23.4953152 |
That’s roughly a 56% reduction in prompt tokens and a noticeable 5-second speed improvement, with the same output quality from the model. So yes, TOON doesn’t just look good on paper. It’s actually faster, cheaper, and still easy to read.
Closing thoughts#
It’s fascinating to see how we’ve come full circle: we spent years teaching AIs to output structured data, and now we’re optimizing our input to speak their language better.
Whether TOON becomes the new standard or just another clever niche idea, it’s definitely worth keeping an eye on. Especially if you care about performance, cost, and efficiency (and let’s be honest, who doesn’t?).
