Back to blog

Building a Production-Grade Postgres MCP Allowlist

Spencer Pauly
Spencer Pauly
7 min read
Building a Production-Grade Postgres MCP Allowlist

If you're going to let an AI agent touch a real Postgres, the smallest thing you can build that actually keeps you safe is an allowlist. Not a database role — a parsed-SQL-against-a-list check, before the query runs.

This post is about how I'd build that today. It's roughly what QueryBear does internally, simplified down to the parts you'd actually need if you wrote it yourself.

What "allowlist" actually means here

The word means three different things at different levels of the stack:

  • Database role grants. What GRANT SELECT ON users TO agent_role gives you. Useful but blunt — you can't say "this column yes, this column no" except via views or row-level security.
  • Schema visibility. What search_path and REVOKE ... FROM PUBLIC shape. Fine, but agents can ignore search_path by qualifying names.
  • Application-level allowlist. What your gateway enforces by parsing the SQL and refusing anything that references off-list objects. This is the level that actually maps to how you think about permissions.

Roles and schema visibility are the floor. The application-level allowlist is what does the work.

The data model

The smallest useful allowlist has four kinds of records:

  • tables — qualified names (public.users, analytics.events) that the agent can SELECT FROM.
  • columns_blocked — qualified column names the agent can never see (public.users.password_hash, public.users.ssn_last_4).
  • functions — qualified function names the agent can call (public.lower, public.coalesce). Empty by default, with a curated list of safe builtins.
  • row_filters — per-table SQL fragments injected into every SELECT against that table. e.g. on support_tickets, force tenant_id = $current_user.tenant_id.

That's the whole schema. ~100 rows of config covers a typical small app.

The parse step

The thing that separates "allowlist" from "wishful thinking" is a real SQL parser. You cannot do this with regex. You will get the regex wrong on day one and someone will figure out how to slip past it on day two.

For Postgres, your options are:

  • libpg_query. A C library extracted from Postgres's own parser, with bindings for almost every language. This is what I use.
  • pgsql-ast-parser (TypeScript), pglast (Python), pg_query (Ruby). All wrap libpg_query.

What you do with the parsed AST:

  1. Walk every RangeVar (table reference). For each, check it's in the tables allowlist with the right schema. If not, reject.
  2. Walk every ColumnRef. For each, check that no part of the path matches a columns_blocked entry. If a query selects users.password_hash even via a wildcard, reject.
  3. Walk every FuncCall. For each, check the name resolves to an allowlisted function. Anything else, reject. This catches pg_read_file, dblink_*, custom functions with side effects, etc.
  4. Reject all DML and DDL outright. InsertStmt, UpdateStmt, DeleteStmt, CreateStmt, etc. — the agent doesn't issue these in your gateway.
  5. Reject statements with a WITH RECURSIVE longer than a configurable depth. Catches accidental infinite recursion.
  6. Reject queries that include subqueries against pg_catalog.* unless the user explicitly opted in. Most agents don't need to introspect catalog directly; you've already exposed schema metadata via get_schema.

Each rejection is a 4xx-style error returned to the agent with a one-sentence reason. Good agents read the reason and adjust. Bad ones don't, and you've still kept them out.

The wildcard problem

SELECT * FROM users is the obvious headache. The user might have meant "every column the role can see," but that's exactly the column allowlist's job. The simplest thing is to refuse * outright and force the agent to enumerate columns. This sounds annoying. It is annoying. It is also worth it.

A nicer middle ground: rewrite SELECT * FROM t into SELECT col1, col2, col3 FROM t where col1..col3 is the allowed-column list for t. The model gets back its data. Blocked columns silently never make it into the model context. Costs a bit more parser work but avoids agent friction.

Row filters

A row filter is "this user can only see rows where tenant_id = X." You implement it by injecting an additional WHERE clause into every SELECT that touches the filtered table.

Three places this can live:

  • Postgres row-level security (RLS). The native Postgres mechanism. Solid. Requires that you set per-session GUCs from the gateway with SET app.current_user = '...'. Powerful, but tying agent identity to a Postgres session is more plumbing than it's worth for most setups.
  • Gateway-injected WHERE clause. You add AND <filter> to the query before running it. Easier to maintain. Less foolproof — if you accidentally let a query through unparsed, the filter doesn't apply.
  • Materialized views. Pre-filter the data into views the agent can see. Performance is great. Refresh cost is real.

I usually recommend gateway-injected WHERE for small apps, RLS for anything regulated.

What an allowlist file actually looks like

Here's a real shape, slightly simplified, that we use as the config for a typical QueryBear-style gateway:

tables:
  - public.users
  - public.orders
  - public.order_items
  - public.products
  - public.support_tickets
  - public.support_messages
 
columns_blocked:
  - public.users.password_hash
  - public.users.password_reset_token
  - public.users.totp_secret
  - public.users.ssn_last_4
 
functions_allowed:
  # date / number / string builtins; full list usually ~60 entries
  - now
  - date_trunc
  - count
  - sum
  - avg
  - lower
  - coalesce
 
row_filters:
  - table: public.support_tickets
    filter: "tenant_id = current_setting('app.tenant_id')::uuid"
  - table: public.support_messages
    filter: "ticket_id IN (SELECT id FROM support_tickets WHERE tenant_id = current_setting('app.tenant_id')::uuid)"
 
limits:
  default_row_limit: 1000
  statement_timeout_ms: 5000
  max_explain_cost: 100000

Three readable, hand-maintainable kinds of records. No magic. The output of every parse against this is "yes, run it" or "no, here's why."

The agent-facing surface

Here's the trick that makes the whole thing usable by agents: the agent doesn't see the allowlist directly, but the schema you expose to it is filtered through the allowlist.

When the agent calls get_schema, you return:

  • Only the tables in the tables allowlist.
  • For each table, only the columns not in columns_blocked.
  • An annotation noting that WHERE tenant_id = ? is already applied to relevant tables.

The agent's mental model becomes "these are the tables and columns I can use." It writes queries against that model, the parser rejects the rare violation, and most of the time the agent stays inside the lines because the lines are visible.

This part is critical. If you tell the agent "you can read everything" but then reject queries, the agent thrashes — generating, getting rejected, retrying, getting rejected. If you tell the agent "here's exactly what's available," it writes queries that work the first time.

What you gain over a database role

Could you do all of this with grants, views, RLS, and REVOKE? In theory yes. In practice no, for two reasons.

One: granular column privacy in vanilla Postgres requires either column-level grants (which exist and are clunky) or one view per access pattern (which is a lot of views). Application-level allowlists are easier to reason about and easier to audit.

Two: most of the value of an allowlist is the part that isn't expressible as a grant — the function-call allowlist, the row-limit injection, the statement-timeout enforcement, the audit log of which queries hit. None of those are SQL primitives.

So the database role is the floor. Make it tight. The allowlist is the part that does the actual permission work.

What this looks like in QueryBear

QueryBear's gateway is a more featured version of the above. The config lives in a UI rather than a YAML file, the parser is libpg_query-backed, and the audit log is a database table per workspace. The thing the MCP-connected agent sees is identical in shape to a vanilla Postgres MCP, except every query goes through this stack first. About 95% of "I need an allowlist" implementations end up with this shape eventually.

The ones who write it themselves get to learn libpg_query. The ones who don't, use ours. Either way, the shape is the same. Just please put a parser in front of your AI's database connection. Read-only-mode is not it.

4 comments

  • engineering_today

    The wildcard rewrite trick (`SELECT *` → `SELECT col1, col2, ...`) is great. Avoids the 'agent thrashes against rejection' failure mode while keeping column blocking enforced.

  • tjones_dba

    RLS for regulated workloads is the correct call. Gateway-injected WHERE is fine for SaaS but compliance auditors want it at the database layer.

  • wfh_dev

    100-row YAML file vs writing a permission system in code — strongly endorse the YAML approach. Auditing is so much easier when the policy is hand-readable.

  • skeptical_dba

    The function allowlist is the part most people skip. They allowlist tables and columns and forget that `SELECT pg_read_file(...)` will happily exfiltrate anything the postgres user can read.

Database Access

Give Your AI Agents
Database Access. Securely.

Connect any database. Control permissions. Audit every query. All running locally on your machine.