IBM Research AI published Abstract Chain-of-Thought (Abstract-CoT), a post-training framework that replaces verbose natural-language reasoning traces with a short sequence of tokens drawn from a reserved abstract vocabulary, before generating a final response. The method uses a three-phase training loop — masked supervised fine-tuning, constrained self-distillation, and reinforcement learning — and was evaluated on Qwen3-8B, Qwen3-4B, and Granite-4.0-Micro (3B), achieving up to 11.6x fewer reasoning tokens with comparable scores on mathematical reasoning, instruction-following, and multi-hop reasoning benchmarks. The abstract vocabulary was found to follow a power-law distribution similar to natural language, suggesting the model develops structured latent representations.

IBM Research: Abstract Chain-of-Thought Cuts Reasoning Token Usage 11.6x with Comparable Performance

Citations