Hacker News, unpacked: a 22 GB dataset in a single SQLite file

Hacker News, unpacked: a 22 GB dataset in a single SQLite file
A female engineer using a laptop while monitoring data servers in a modern server room.

A Show HN project packages a large swath of Hacker News into a 22 GB SQLite database, turning the site’s history into a single-file dataset you can query locally. What’s notable here isn’t just the volume, but the format: SQLite means zero setup, no API rate limits, and instant compatibility with the tools developers already use-sqlite3, DuckDB’s sqlite_scanner, Datasette, pandas, and every language with a SQLite driver. For anyone running ad‑hoc analyses, building dashboards, or testing ranking ideas without spinning up infra or paying for BigQuery scans, this is the lowest-friction path.

Under the hood, it’s “just” SQLite, which is the point. You can inspect the schema, add your own indexes, or layer on FTS to explore threads and titles. Worth noting: at 22 GB you’ll want to be mindful of memory and indices for heavy joins; treating the file as read-only and vacuuming after index creation will help. The bigger picture is a quiet endorsement of SQLite as a distribution format for medium-large public datasets-portable, reproducible, and easy to integrate with other local tables. The practical upside: HN analyses that used to require cloud warehouses now fit on a laptop SSD, making experiments faster and more repeatable.

Subscribe to SmmJournal

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe