Some thoughts on Interleaved and Preserved thinking modes in GLM
A path to better, more efficient multi-step agentic tool calling at the model layer.
GLM-5 from Z.ai was released recently. I have been meaning to read the white-paper that was published, but an X article on the subject brought my attention to the terms Interleaved and Preserved thinking.
When a traditional reasoning model has to make multiple tool calls in order to respond to a question, it will often think once at the start of the session, and then make subsequent tool calls without thinking again. This results in the decision of which subsequent tools to call (after the first one), and the output of those tool calls to not be included in the thinking process. The outcome, for multi-step agentic workflows, is a not very accurate, not very consistent model response. While this approach is sufficient for single-turn Q&A style answers, or single tool-call use-cases, it starts to degrade in quality over multi-step agentic workflows.
Without interleaved thinking, the model thinks at the start of the session, and does not generate new/subsequent thinking blocks after every tool call.
For an agentic workflow, thinking through each tool call and its response, making finer-grained decisions on what tool to call next, substantially improves the quality of the output.
Interleaved thinking makes just this possible. By interleaving thinking blocks before and after every tool call, and putting the responsibility on the caller to provide the reasoning blocks in each turn, the model is better able to reason through every step in its search for the perfect answer.
With interleaved thinking, the model can think after receiving each tool call result, allowing it to reason about intermediate results before continuing.
Finally, with preserved thinking, when the reasoning blocks are provided in each subsequent call to the model, the model is able to retain the reasoning context in order to build on each step. This is different from the traditional approach where previous reasoning context, if available, is discarded, and result of the tool call is viewed in an independent light. Instead of the caller building a scaffold to keep track of this context and provide it to the model is a summarized way, the model now has an innate ability to do it, so long as reasoning blocks from previous steps are provided in order. The end result is a model that excels at coding and agentic multi-step workflows. I suppose that’s why the paper is tilted from Vibe coding to Agentic engineering.
Interestingly, the latest Anthropic models also support interleaved thinking. Surprisingly, though, I don’t see any documentation for that from OpenAI which leads me to believe that they don’t yet expose it for public use if they have it internally.

