What is Chain of Thought?
Most chat interfaces hide the model’s reasoning. You send a message, the model processes it internally, and you get an answer — with no visibility into how it got there. For simple questions that’s fine. For complex ones — multi-step math, debugging logic, structured analysis — the answer alone isn’t always enough.
Chain of Thought (CoT) is a prompting technique that makes the reasoning process explicit. Instead of jumping straight to a conclusion, the model works through the problem step by step, producing a visible reasoning trace before the final response.
Without CoT: With CoT:
User: "Is 97 prime?" User: "Is 97 prime?"
Model: "Yes." Model: [thinking]
97 ÷ 2 = 48.5 (not integer)
97 ÷ 3 = 32.3 (not integer)
97 ÷ 5 = 19.4 (not integer)
97 ÷ 7 = 13.8 (not integer)
√97 ≈ 9.8, checked all primes ≤ 9
[/thinking]
Yes, 97 is prime.
The thinking trace is visible in the UI but visually separated from the final answer, so you get the transparency without it getting in the way.
The Project
Reasoning Assistant is a Next.js chat interface built on top of Groq’s inference API. Groq provides fast inference for reasoning-capable open models, making the streaming latency low enough to feel responsive even when the thinking trace is long.
Users can switch between models and toggle thinking mode on or off per conversation:
| Model | Provider | Thinking |
|---|---|---|
| DeepSeek R1 | Groq | Yes |
| Qwen3 32B | Groq | Yes |
| Gemma2 9B | Groq | No |
Features
Toggleable thinking mode — reasoning traces are opt-in. Turn it on for complex questions, off when you just want a quick answer.
Real-time streaming — both the thinking trace and the final response stream token by token via the Groq SDK’s streaming API, so there’s no waiting for a complete response before anything appears.
Markdown rendering — responses render with full markdown support including syntax-highlighted code blocks, useful for programming and technical questions.
Model switching — swap between models mid-conversation without losing the chat history.
Tech Stack
| Layer | Tech |
|---|---|
| Framework | Next.js 15 (App Router), React 19, TypeScript |
| Styling | Tailwind CSS v4, Radix UI |
| AI inference | Groq SDK |
| Markdown | React Markdown with syntax highlighting |
| Icons | Lucide React |
Setup
git clone https://github.com/akdevv/reasoning-assistant.git
cd reasoning-assistant
bun install
Add your Groq API key to .env.local:
GROQ_API_KEY=your_groq_api_key_here
bun dev
Key Decisions
Why Groq instead of calling OpenAI or Anthropic directly? Groq’s hardware inference is significantly faster than standard API endpoints for the models it supports. For a chat interface where you’re streaming a long reasoning trace, that speed difference is noticeable in feel. DeepSeek R1 and Qwen3 are also free-tier on Groq, so the demo works without any paid API key.
Why make thinking mode a toggle? A visible reasoning trace is genuinely useful for some prompts and noise for others. Forcing it on for every message would make simple conversations tedious. Letting users opt in per-session respects the fact that CoT is a tool, not a default.
Why keep the thinking trace visually distinct from the answer? The reasoning trace can be long — sometimes longer than the answer itself. Mixing it inline with the response would make the output hard to read. Separating it visually (collapsible or dimmed) lets you reference the thinking without it dominating the conversation view.
Outcome
A minimal but complete reasoning-aware chat interface. The project explores how to present model transparency — making the AI’s thinking process visible — without overwhelming the user or adding friction to everyday use.
