dbt Isn't Declarative — And That's a Problem

2025-04-10 5 min

Even elegant things crumble without the right foundation...

dbt looks declarative — but it’s not. That lie costs your team time, safety, and trust.

📐 This post is for:

Data engineers
Platform builders
Technical leaders

…who feel like the modern data stack just isn’t cutting it anymore — and are ready to ask why.

TL;DR

dbt looks declarative — but it’s not. Under the hood, it’s an imperative DAG of side-effecting SQL.
That breaks correctness, velocity, and reproducibility.
Why? Because the underlying database model mutates state instead of versioning it.
dbt can’t fix this. No framework can - not without changing the foundation.
AprioriDB is that foundation. We’re building a database that’s append-only, cell-versioned, artifact-based, and declarative all the way down.

📢 If you’re tired of pipelines that forget what changed, you’re not crazy. We felt it too. AprioriDB is the new foundation. Let’s fix it at the root.

📚 What’s in this post

🧩 dbt Looks Declarative — But It’s Not
✅ Why This Matters: Correctness
🚀 Why This Matters: Velocity
🔁 Why This Matters: Reproducibility
🧱 The Real Problem Is the Foundation
🔮 The Future Is Declarative — All The Way Down

🧩 dbt Looks Declarative - But It’s Not

dbt claims to be declarative.

It’s not.

A dbt project is executed by running dbt run. At runtime, dbt run takes a set of selectors, compiles your models, and executes a DAG of side-effecting SQL and Python tasks.

If you care about correctness, velocity, reproducibility, performance, or lineage — a system built on imperative mutations will fail you.

Let’s look more closely at how dbt actually works and why this design choice matters.

How dbt Actually Works

It looks declarative.

You define models as SQL SELECTs.
You declare dependencies between models.
You write tests, exposures, and documentation.

This seems declarative. SQL SELECT statements are declarative. Model names become resulting tables and views.

Here’s what actually happens:

dbt run --select my_model+

Resolves selectors to a subgraph of the DAG.
Compiles your models into SQL and Python tasks.
Executes them topologically.
Mutates the database by creating, replacing, or updating tables and views.

This isn’t declarative data modeling. It’s Makefiles for database objects.

✅ Why This Matters: Correctness

Inconsistency Is Inevitable

No Input Freezing

dbt doesn’t freeze input state while its long-running DAGs execute.

If source tables change mid-run, downstream models see inconsistent inputs.

You, the engineer, must ensure upstream inputs are stable.

And if your project mixes views and tables?

Views reflect live data immediately.
Tables only change when you re-run the model.

Your data is now split-brain.

Partial DAG Runs Are Dangerous

Because full DAG runs are slow, teams often use selectors to run subgraphs.

But this only works if you track what changed.

Declarative systems should know this. dbt doesn’t.

Incremental Models Are a Time Bomb

Incremental models compile to INSERT statements that depend on the current state of the target table.

If your inputs are changing, you get inconsistent rows inserted at different times based on different upstream states.

This is operational debt waiting to explode.

🧠 Root Problem

dbt doesn’t protect you from yourself.

A truly declarative system would:

Freeze inputs during execution
Prevent partial runs from corrupting state
Make incremental models safe by default

But dbt isn’t built that way.

🚨 The Consequence: Data Drift

Databases are long-lived. And drift accumulates.

Testing in dbt is light, optional, and post-hoc.

After a few months, your database is a mess.

You will have published wrong data.

🔒 Lack of Safety

Here’s how dbt handles failure:

Replaces tables and views while running.
Runs tests after changes are made.
If tests fail, the run halts.

What’s left?

Partially updated state. And no rollback.

dbt doesn’t back up the previous state. Once tables are overwritten — they’re gone.

Can you guarantee no one is reading those broken tables?
Can you roll back to last known good state?

No. Because dbt already destroyed it. And not even time travel can save you.

✅ What Safety Should Look Like

Keep the last-known-good objects live.
Build new models in isolation.
Test before promotion.
Promote only if all tests pass.

This is standard in software deployment.

Why should data be any different?

🚀 Why This Matters: Velocity

2am Debugging Isn’t Velocity

You get paged:

It’s 2AM.
dbt tests failed.
Your DAG halted.
Your warehouse is in a broken state.
Dashboards will break by 8AM.

You scramble.

But you don’t know why it failed.

Bad upstream data?
Schema change?
Logic bug?

You tweak some code. Nudge an input. Re-run.

You wait.

Maybe it works in 2–3 hours. Maybe not.

This is the opposite of velocity.

Only Experts Can Fix It

Most of your team can’t fix these issues.

Only your most senior engineers know enough of the system’s quirks to debug it quickly.

Everyone else is stuck guessing.

This isn’t sustainable.

Long Iteration Loops

dbt re-runs models even if:

Inputs haven’t changed.
Code hasn’t changed.
Outputs are still correct.

This leads to slow, wasteful cycles:

Change a model.
Run dbt (wait).
See a bug.
Fix it.
Run dbt (wait).

Repeat.

🧠 Why Should Humans Track This?

Computers should know what changed. Humans shouldn’t guess.

A true declarative system would:

Track input and code changes.
Skip unnecessary work.
Rebuild only the required parts.

🔁 Why This Matters: Reproducibility

No Guaranteed Rebuilds

dbt can’t guarantee that the same inputs and code will produce the same output.

Why?

Because dbt builds an imperative DAG that mutates state.

Your models:

Depend on live upstream data.
Depend on intermediate tables from prior runs.
Perform incremental INSERTs based on runtime state.

This means:

Full rebuilds may differ from previous ones.
Incremental logic can’t verify prior state.
Partial runs can silently corrupt results.

Reproducibility becomes best effort — not guaranteed.

No Artifact-Based State Management

True reproducibility means versioning:

Inputs
Execution context
Transformations
Outputs
Execution history

dbt doesn’t do this.

No input versioning.
No persistent intermediate results.
No fine-grained lineage.

✅ What Reproducibility Should Look Like

Versioned, immutable inputs.
Deterministic transformations.
Versioned, immutable outputs.
Artifact-based history.

You don’t overwrite old data until you know the new version is good.

You promote a proven artifact — you don’t re-run and hope.

You get fast, safe rollbacks and clear diffs.

That’s reproducibility.

🧱 The Real Problem Is the Foundation

None of this is dbt’s fault.

It’s a great tool. It brought data-as-code into the mainstream.

But dbt is built on top of mutable, side-effecting SQL databases.

And that’s the real problem.

You can’t build a declarative system on top of an imperative foundation.

If:

Your storage engine mutates state…
Your transformations overwrite prior results…
Your lineage is inferred post-hoc…
Your outputs are side-effected blobs…

Then:

Everything built on top of that will eventually rot.

That’s why we’re building AprioriDB:

Append-only storage
Cell-level versioning
Branching + merging
Fully declarative semantics

Not just a better dbt. A better foundation.

🔮 The Future Is Declarative — All The Way Down

If you’ve been on-call for a broken dbt run…
If you’ve lost sleep debugging DAGs…
If your pipelines feel fragile, slow, and brittle…

You’re not crazy.

This is what happens when we fake declarative workflows on top of imperative engines.

We believe the future of data:

Isn’t DAGs of mutations.
Isn’t pipelines you have to babysit.
Is versioned, immutable, reproducible — by default.

Declarative all the way down.

📣 We’re looking for early adopters, collaborators, and people who feel the pain we’ve felt. If that’s you — we’d love to hear from you.

Written by Jenny Kwan, co-founder and CTO of AprioriDB.

Follow me on Bluesky and LinkedIn.