Hacker News, unpacked: a 22 GB dataset in a single SQLite file
A Show HN project packages a large swath of Hacker News into a 22 GB SQLite database, turning the site’s history into a single-file dataset you can query locally. What’s notable here isn’t just the volume, but the format: SQLite means zero setup, no API rate limits, and instant compatibility with the tools developers already use-sqlite3, DuckDB’s sqlite_scanner, Datasette, pandas, and every language with a SQLite driver. For anyone running ad‑hoc analyses, building dashboards, or testing ranking ideas without spinning up infra or paying for BigQuery scans, this is the lowest-friction path.
Under the hood, it’s “just” SQLite, which is the point. You can inspect the schema, add your own indexes, or layer on FTS to explore threads and titles. Worth noting: at 22 GB you’ll want to be mindful of memory and indices for heavy joins; treating the file as read-only and vacuuming after index creation will help. The bigger picture is a quiet endorsement of SQLite as a distribution format for medium-large public datasets-portable, reproducible, and easy to integrate with other local tables. The practical upside: HN analyses that used to require cloud warehouses now fit on a laptop SSD, making experiments faster and more repeatable.