The Cohort Retention SQL Query I Actually Use

Retention is the metric people most want and least want to write the SQL for. The chart is simple to picture: group users by the month they signed up, then track how many came back in each later month. Writing it from scratch every time is annoying enough that most people just don't, which is why so few teams actually look at retention.
Here's the query I keep around, with the parts explained so you can bend it to your tables.
The shape of the problem
You need two facts about each user: when they first showed up, and which months they were active. Almost every schema has these somewhere. Signup is usually a created_at on a users table. Activity is usually a timestamp on whatever event means "this person used the product" — an order, a session, a login.
Two CTEs, then a join. That's the whole thing.
with cohorts as (
select
id as user_id,
date_trunc('month', created_at) as cohort_month
from users
),
activity as (
select
user_id,
date_trunc('month', occurred_at) as active_month
from events
group by 1, 2
)
select
c.cohort_month,
-- months since signup: 0, 1, 2, ...
(extract(year from a.active_month) - extract(year from c.cohort_month)) * 12
+ (extract(month from a.active_month) - extract(month from c.cohort_month)) as month_number,
count(distinct c.user_id) as active_users
from cohorts c
join activity a on a.user_id = c.user_id
where a.active_month >= c.cohort_month
group by 1, 2
order by 1, 2;That gives you a long table: for each cohort, how many users were active 0, 1, 2, … months after signing up. Month 0 is your cohort size.
Turning counts into percentages
Raw counts are hard to compare across cohorts of different sizes. You want the percentage that came back. Wrap the result and divide by each cohort's month-0 count:
with retention as (
-- the query above goes here, as a CTE
...
),
cohort_size as (
select cohort_month, active_users as size
from retention
where month_number = 0
)
select
r.cohort_month,
r.month_number,
r.active_users,
round(100.0 * r.active_users / s.size, 1) as pct_retained
from retention r
join cohort_size s on s.cohort_month = r.cohort_month
order by r.cohort_month, r.month_number;Now pct_retained is the number you actually plot. Month 0 is always 100%. The shape of the dropoff after that tells you whether your product sticks.
The two mistakes I always see
The first is counting activity that includes month 0 as if it were retention. Of course people are active the month they signed up. That's not retention, that's the denominator. Month 1 onward is the real story.
The second is using a "last seen" timestamp instead of monthly activity buckets. If you only track each user's most recent activity, you can't reconstruct which months they were present. You need the event-level data, bucketed by month. If your events table is huge, this query will be slow, so run it against a read replica or materialize the monthly activity rollup once a day.
Or just ask for it
I'll be honest about why this query lives in a snippet file and not in my head: I write it maybe once a quarter, and every time I have to re-derive the month_number arithmetic. These days I mostly just ask in plain English and read the SQL it produces to check the join. That's the workflow QueryBear is built around, but the query above works fine pasted straight into psql. Keep it somewhere. Future you will want it the next time someone asks "is retention getting better or worse" and expects an answer before the meeting ends.