3 min read
A Simpler Way to Calculate NPS in SQL
Skip the percentage arithmetic — map scores to -100, 0, and 100, and let AVG() do the rest. A mathematical proof and practical ClickHouse benchmark.

If you’ve ever computed Net Promoter Score at scale, you know the standard formula:

NPS=%Promoters%DetractorsNPS = \%Promoters - \%Detractors

It’s simple on paper — but in SQL, it usually turns into multiple conditional aggregates and repeated division logic.

(countIf(score >= 9) * 100.0 / count(*))
- (countIf(score <= 6) * 100.0 / count(*))

This works — but it’s repetitive, easy to get wrong, and not cheap at scale.

We found a simpler way1: assign each response a numeric value and take the average.

  • detractors → 100-100
  • passives → 00
  • promoters → 100100

That’s it. One AVG() call, same result.

Here’s why it works.

The Proof in 30 Seconds

Say you have NN total responses: PP promoters, VV passives, and DD detractors, where P+V+D=NP + V + D = N.

The traditional formula expands to:

NPS=PN×100DN×100=100×(PD)NNPS = \frac{P}{N} \times 100 - \frac{D}{N} \times 100 = \frac{100 \times (P - D)}{N}

Now take the average of our assigned values:

AVG=P×100+V×0+D×(100)N=100P100DN=100×(PD)NAVG = \frac{P \times 100 + V \times 0 + D \times (-100)}{N} = \frac{100P - 100D}{N} = \frac{100 \times (P - D)}{N}

Identical. No tricks, no approximations — it’s algebraically the same expression.

Why This Matters in Practice

The equivalence isn’t just a neat math fact. It changes when you do the heavy lifting.

With the traditional approach, every query has to classify each row, count groups, and compute percentages on the fly. In ClickHouse (or any column-oriented database), that means scanning the score column, evaluating conditionals per row, and aggregating multiple times.

With the averaging approach, you precompute a single integer column — call it nps_value — at insert time or in a materialized view. The value is derived once from the original score:

CASE
    WHEN score >= 9 THEN 100
    WHEN score >= 7 THEN 0
    ELSE -100
END AS nps_value

For example, given these five responses:

responsescorecategorynps_value
#110promoter100
#29promoter100
#39promoter100
#48passive0
#53detractor-100

The traditional way: 3 promoters (60%) minus 1 detractor (20%) = NPS 40.

The average way: (100+100+100+0+(100))/5=200/5(100 + 100 + 100 + 0 + (-100)) / 5 = 200 / 5 = NPS 40.

From that point on, every NPS query is just:

SELECT avg(nps_value) FROM responses

That’s a single pass over a single column — no branching, no multi-phase aggregation — just a straight average over a well-compressed column or a dictionary lookup. In our benchmarks on ~63 million rows in ClickHouse, the avg() approach ran about 1.5x faster than the traditional countIf() method on grouped dashboard queries, while producing identical results.

The Bonus: Free Slicing and Dicing

The real payoff shows up when you need NPS broken down by segment, time period, or any other dimension. Here’s the traditional approach:

SELECT
    organization_id,
    toStartOfMonth(created_at) AS month,
    (countIf(score >= 9) * 100.0 / count(*))
      - (countIf(score <= 6) * 100.0 / count(*)) AS nps
FROM responses
WHERE score IS NOT NULL
  AND score BETWEEN 0 AND 10
GROUP BY organization_id, month
ORDER BY month

Every time you add a dimension — venue, survey, country — you carry along the full countIf / count(*) / countIf / count(*) expression. It’s correct, but noisy and easy to get wrong.

With the precomputed column, the same query becomes:

SELECT
    organization_id,
    toStartOfMonth(created_at) AS month,
    avg(nps_value) AS nps
FROM responses
GROUP BY organization_id, month
ORDER BY month

Same result, no matter how many dimensions you slice by.

Wrapping Up

Next time you’re computing NPS in SQL, skip the percentage arithmetic. Map your scores to 100-100, 00, and 100100, precompute the column, and let AVG() do the rest. It’s mathematically identical, trivially verifiable, and cleaner to write — especially when you’re slicing by multiple dimensions.

Footnotes

  1. This mathematical equivalence has been noted before, e.g. by CustomerGauge. The focus of this post is on the practical SQL and database performance implications of the approach.