Data, where information lives

AI asks: Where should this data live?

Before we answer the question, let's name the trap. The trap is that when someone says "data," most people instantly think of databases, and databases sound complicated. They are not, mostly, but the word triggers a freeze response that interferes with the actual question.

So let's reframe. All data lives in some structure, somewhere. The question is which structure and which somewhere.

The four shapes of "where data lives"

Almost every piece of data in every piece of software in the world lives in one of these four places.

A file on disk. The simplest possible answer. Your photos live in files on your phone, your essay drafts live in files on your laptop, exported reports often live in files in cloud storage. Files are fine for things that are read often and written rarely, by a small number of clients.

A spreadsheet. A specific shape of file, with rows and columns, with the ability to do simple lookups and calculations. Google Sheets and Airtable have made this a perfectly respectable backend for many small apps, and Day 16 will return to this.

A database. A purpose-built service for storing and querying structured data, optimized for many concurrent reads and writes. Postgres, MySQL, SQLite, MongoDB, DynamoDB are all databases. They differ in shape and trade-offs.

A data warehouse. A specific kind of database optimized for analytics rather than for serving an app. Snowflake, BigQuery, Redshift. You probably do not need one until you have a fair amount of data and a real analytics question.

Image slot

Suggested meme: Marie Kondo (the tidying expert) holding what is unmistakably a database table or a single column, with the caption 'does this column spark joy?'. Save as public/lessons/day-11-meme.png and add src='/lessons/day-11-meme.png'.

The schema review you never run, but should.

For most new builds, the answer is "a database, almost certainly Postgres, hosted by someone else." The reasons are: Postgres is excellent, hosted Postgres providers (Supabase, Neon, Render, AWS RDS) have made setup trivial, and you can grow into much larger workloads without changing tools.

Relational versus document, said in English

You will see this distinction in every "which database" article. It usually goes like this.

Relational (Postgres, MySQL, SQLite) means the data lives in tables (rows and columns), like a spreadsheet, with strict structure. Every row in the users table has the same columns. You can ask questions across tables ("show me every comment from every user in San Francisco who signed up after January") in a single query.

Document (MongoDB, DynamoDB, Firestore) means each piece of data is its own little JSON-shaped object, with no enforced structure. Different documents in the same collection can have different fields.

For most things, pick relational. The strictness is a feature, not a bug, because it keeps your data clean. Choose document when you have a clear reason (massive scale with simple access patterns, or wildly varying record shapes).

How to answer the "where should this data live" question

Two questions, in order.

Is this data read by other users, or only by the user who created it? If only by the creator and you do not need to sync across their devices, a local file is fine. If shared across users or synced across devices, you need a database.

How much of it, and how often? If it is a small amount (less than a million rows) and accessed casually, almost any database will do, and a spreadsheet might even be enough. If it is millions or billions of rows and accessed constantly, you need a specific kind of database tuned for your workload, and that is a real decision worth getting right.

For Week 3 builds, your answer is almost always: Postgres, hosted by Supabase or Neon. Move on.

A small vocabulary sweep

Schema. The structure of your data. "Users have a name, an email, a created-at timestamp, and an avatar URL" is a schema.
Migration. A change to the schema, applied in a controlled way. ("Add a 'verified' column to the users table.")
Query. A request to the database. "Give me all users who signed up in the last week."
Index. A behind-the-scenes structure the database keeps to make certain queries fast. Forgetting to add an index is one of the most common reasons a fast app becomes a slow app at scale.
Backup. A copy of your data taken at a point in time. You should have one, you should test that you can restore from it, you probably don't.

Forward references

Day 12 covers where the database itself runs, which is mostly a question of which hosting provider you use. Day 13 covers who is allowed to read and write what, which is the authorization side of data. Day 20 returns to the privacy and compliance obligations that kick in the moment you store anything about real people.

Day 11 wrap

The thing you can now say plainly. All data lives in some structure, somewhere. For most builds the answer is a hosted relational database (Postgres) and the choice barely matters within that bucket.

The thing you can now do. When AI asks "where should this data live," skip the database-shopping anxiety and pick the default (Postgres, hosted), unless your build has a specific reason to differ.

The guardrail to remember. Don't store data you don't need. The data you don't have can't get leaked, subpoenaed, or accidentally emailed.

See you on Day 12, where we finally answer the question that broke the scientist.