The Problem

Read-it-later apps are either cloud-locked, behind a paywall, or both. Pocket, Instapaper, and their equivalents store your articles on their servers, require accounts, and offer text-to-speech only on paid tiers — and even then it sounds robotic and unnatural.

If you want to save an article and listen to it on your commute, you’re paying someone else to own your reading list and your audio.

Most read-it-later apps treat TTS as a premium upsell. But the models to do it locally have been available for years — they just weren’t packaged accessibly.

The gap was clear: a clean, private, local-first app that could extract articles from the web and generate natural-sounding audio without touching a single external API.

The Solution

Simple Reader runs entirely on your machine. One docker compose up starts the Next.js app, PostgreSQL, and the TTS service together. No accounts. No API keys. No data leaving your network.

Article extraction uses Mozilla Readability — the same library Firefox Reader View uses — to strip away ads, popups, and navigation and return just the content. The app also handles paywall detection and removes cookie banners before parsing, so you get clean text even from noisy pages.

Text-to-speech is powered by Kokoro-82M, a small ONNX model (~92MB) that auto-downloads on first use and runs via a Node.js child process. It produces natural-sounding speech with sentence-level alignment so the reader can highlight each sentence as it plays.

Running It

docker compose up -d --build

Open localhost:3000. PostgreSQL, the Next.js app, and TTS all start from this single command.

Development mode

cp .env.example .env
docker compose up -d postgres       # start just the database
pnpm install
pnpm prisma generate
pnpm prisma migrate deploy
pnpm dev

How It Works

User submits a URL or pastes text directly
URLs are fetched, cleaned (popup removal, paywall detection), and parsed with Readability
Content is split into typed sections — paragraphs, headings, code blocks, tables, lists, images, videos, blockquotes
Sections are stored as JSON in PostgreSQL
TTS spawns a Node.js child process running Kokoro-82M locally — the ONNX model auto-downloads on first use
Audio is saved as WAV and served statically with sentence-level alignment for highlighted playback

Tech Stack

Layer	Tech
Framework	Next.js (App Router), React 19, TypeScript
Styling	Tailwind v4, shadcn/ui
Database	PostgreSQL 16, Prisma
Article extraction	Mozilla Readability, JSDOM
Text-to-Speech	Kokoro-82M via kokoro-js + onnxruntime-node
Code highlighting	Shiki
Deployment	Docker Compose

TTS Configuration

All TTS options are environment variables — set them in .env or pass them to Docker:

Variable	Default	Description
`KOKORO_VOICE`	`af_heart`	Voice name
`KOKORO_SPEED`	`1`	Speech speed multiplier
`KOKORO_DTYPE`	`q8`	Model precision (q8 = quantized, faster)

Key Decisions

Why Kokoro-82M instead of a cloud TTS API? Cloud APIs are fast and easy to integrate but they require API keys, send your text to a third-party server, and cost money at scale. Kokoro-82M is small enough to ship in Docker and produces output that is genuinely good — not the robotic quality you get from older local models. The ~92MB download happens once and is cached.

Why store content as typed sections instead of raw HTML? Raw HTML is fragile to render and hard to align TTS to. Splitting content into typed sections (paragraph, heading, code, etc.) gives the frontend a clean data model to render and lets the TTS pipeline operate on discrete text units, which is exactly what’s needed for sentence-level playback alignment.

Why Docker Compose with three services? PostgreSQL and the TTS service have different resource profiles and lifecycle needs from the Next.js app. Separating them makes it straightforward to scale or replace any one piece — and the single docker compose up command keeps the local setup to one line.

Why JSDOM + Readability instead of a scraping API? Scraping APIs add latency, cost, and a network dependency. Running Readability locally is faster, keeps all content on-device, and means the app works on any URL without rate limits or third-party outages.

Outcome

A fully local read-it-later app that works offline once the model is downloaded. Articles are saved, readable, and listenable without any external dependency after the initial setup.

The project reinforced something I keep finding true: the hardest part of local-first software is packaging, not the algorithms. Kokoro-82M has been available for a while. The work was in wiring it into a Docker service, handling the ONNX runtime, and making the sentence alignment feel natural in the UI.