Back to blog

How We Cache Your Schema (And Why Sending It to the AI Every Time Is a Trap)

Spencer Pauly
Spencer Pauly
6 min read
How We Cache Your Schema (And Why Sending It to the AI Every Time Is a Trap)

When QueryBear's AI writes SQL against your database, it has to know your schema — every table, every column, every type, every foreign key. Without that context, it's guessing.

The naive way to do this is to dump the whole schema into the prompt every time. That gets expensive and slow fast. The smarter way is to cache. This post is about how we do that, what we cache, what we don't, and the few things that go wrong if you're sloppy about it.

Why caching is non-optional past a certain database size

A blank Postgres has a couple dozen information_schema rows. A real production database has hundreds. I've onboarded customers with 700+ tables and 30k+ columns, and that's not the upper end.

Concretely: serializing the schema for one of those databases produces ~250KB of text. If you send that to Claude with every question, you pay token costs on every request, you blow through context limits, and the model spends more attention on schema discovery than on the actual question.

The fix is the same fix every database tool has used for thirty years. Cache the metadata. Send only the relevant subset to the model.

What we pull on initial sync

When you connect a database, QueryBear runs a small set of read-only queries against the metadata catalog. For Postgres that's the pg_catalog and information_schema views. For MySQL it's INFORMATION_SCHEMA. SQLite has sqlite_master.

Specifically, per database we pull:

  • Table list, with schema/namespace and a comment if present.
  • Column list per table: name, type, nullable flag, default expression, position.
  • Primary keys and foreign keys, including the join columns and ON DELETE/ON UPDATE behavior.
  • Index list per table, with column composition and uniqueness.
  • For Postgres: pg_enum values for any enum types that columns use.

That's enough to write a correct query for almost everything. We don't pull constraints we can't enforce in our gateway, and we don't pull triggers (the AI can't reliably reason about them anyway).

What we explicitly do not pull on sync:

  • Row data. Not even sample rows.
  • Column statistics or histograms. We don't need them and they leak distributions.
  • Query results from past sessions.

If a database is sensitive enough that table names themselves are sensitive (some healthcare and finance schemas treat them this way), the workspace owner can scope the cache to a per-environment role or move to the desktop app, where the cache lives on the engineer's machine.

What gets sent to the model on each request

The cache is local to QueryBear. It is not the prompt. The prompt is a much smaller, request-specific subset.

When a user asks a question, the flow is:

  1. The question + a tiny "table directory" (table names + one-line summary) goes to the model in a first pass.
  2. The model picks the candidate tables it thinks it needs.
  3. We pull the full column-level metadata for those tables out of the cache.
  4. That goes into a second pass with the question, and the model writes the SQL.
  5. We parse the SQL, validate it against allowlists, and run it.

For a 700-table database, that means a question about "users who placed orders this week" might end up with users and orders and order_items in the model's context — three tables out of seven hundred — instead of the full schema. Latency drops from ~6 seconds to under 1.5. Token cost drops about 100x. The model is more accurate because it isn't drowning in irrelevant tables.

This two-pass shape is also what makes the gateway's allowlists do real work. The first pass can be told "these tables are off-limits to this connection" and they never enter the model's awareness, period.

How we keep the cache fresh

Schema drift is the silent killer of schema caches. You ran your migration last night. The cache still thinks users.email is non-nullable. Now the model writes a query that joins on email assuming uniqueness. The query runs. The numbers are wrong.

We deal with this three ways.

Manual refresh from the dashboard. Anyone with workspace access can hit "refresh schema" and we re-read the metadata. Fast. Useful when you just shipped a migration.

Soft auto-refresh on column-not-found errors. If the model writes a query referencing a column that's not in the cache, we hit the catalog for just that table, refresh it, and retry once. This catches "I added a column ten minutes ago" without a full re-sync.

Periodic background refresh. Every workspace has a low-frequency background job that re-pulls the catalog. The interval depends on plan. The point is that even if you forget to hit refresh, the cache eventually catches up.

The thing we deliberately don't do is poll the catalog every minute. That's a real load on heavily-used databases and doesn't actually help — schema changes don't happen on a one-minute schedule, deploys do.

The bug class to watch out for

There is a specific failure mode worth naming: the model is confident, the cache is stale, the query runs, and the answer is wrong. There's no exception. There's just a number.

The mitigations matter because of this:

  • We hash the schema we use for each query and store it with the query log. If you go back later and ask "did this query run against the new schema or the old one?", the answer is in the log.
  • We refresh proactively after the user clicks a "save query" button. A saved query needs to reflect current reality, not yesterday's.
  • The two-pass design means stale tables don't poison unrelated questions. Stale orders doesn't hurt a question about users.

Why this matters for MCP

The two-pass approach also happens to be how QueryBear's MCP server works under the hood. When Claude Code or Cursor calls get_schema, it gets the table directory. When it calls run_query, the column metadata for the referenced tables is resolved at validation time.

This is the right primitive for agents. They don't want to hold the whole schema in context any more than the model does in a single-turn UI session. They want to ask "is orders.amount_cents a real column" and get a yes or no.

The cache is what makes that cheap.

The boring conclusion

Schema caching isn't a flashy feature. It's the difference between an AI database tool that feels responsive and one that doesn't. It's also the reason QueryBear can run cheaply enough to charge $49 a month flat instead of pricing per token.

The goal of the whole subsystem is simple: the model should know exactly enough about your database to write a correct query, and zero about anything else.

4 comments

  • engineering_today

    Two-pass with the table directory first is exactly how I'd architect this. Big DBs would otherwise blow context out of the water. Have you measured the accuracy delta vs sending the full schema?

  • tjones_dba

    Smart not to pull column statistics. People underestimate how much info column histograms leak about distributions even if you don't pull rows.

  • redis_apologist

    Curious whether you cache the schema in Redis or in your app DB. The latter is simpler but the invalidation story is sketchier.

  • skeptical_dba

    Soft auto-refresh on column-not-found errors is clever but I worry about the case where the column never existed. Do you cap retries?

Database Access

Give Your AI Agents
Database Access. Securely.

Connect any database. Control permissions. Audit every query. All running locally on your machine.