dbt Isn't Declarative — And That's a Problem
dbt looks declarative — but it’s not. That lie costs your team time, safety, and trust.
📐 This post is for:
- Data engineers
- Platform builders
- Technical leaders
…who feel like the modern data stack just isn’t cutting it anymore — and are ready to ask why.
TL;DR
- dbt looks declarative — but it’s not. Under the hood, it’s an imperative DAG of side-effecting SQL.
- That breaks correctness, velocity, and reproducibility.
- Why? Because the underlying database model mutates state instead of versioning it.
- dbt can’t fix this. No framework can - not without changing the foundation.
- AprioriDB is that foundation. We’re building a database that’s append-only, cell-versioned, artifact-based, and declarative all the way down.
📢 If you’re tired of pipelines that forget what changed, you’re not crazy. We felt it too. AprioriDB is the new foundation. Let’s fix it at the root.
📚 What’s in this post
- 🧩 dbt Looks Declarative — But It’s Not
- ✅ Why This Matters: Correctness
- 🚀 Why This Matters: Velocity
- 🔁 Why This Matters: Reproducibility
- 🧱 The Real Problem Is the Foundation
- 🔮 The Future Is Declarative — All The Way Down
🧩 dbt Looks Declarative - But It’s Not
dbt claims to be declarative.
It’s not.
A dbt project is executed by running dbt run
. At runtime, dbt run
takes a set of selectors, compiles your models, and executes a DAG of side-effecting SQL and Python tasks.
If you care about correctness, velocity, reproducibility, performance, or lineage — a system built on imperative mutations will fail you.
Let’s look more closely at how dbt actually works and why this design choice matters.
How dbt Actually Works
It looks declarative.
- You define models as SQL
SELECT
s. - You declare dependencies between models.
- You write tests, exposures, and documentation.
This seems declarative. SQL SELECT
statements are declarative. Model names become resulting tables and views.
Here’s what actually happens:
dbt run --select my_model+
- Resolves selectors to a subgraph of the DAG.
- Compiles your models into SQL and Python tasks.
- Executes them topologically.
- Mutates the database by creating, replacing, or updating tables and views.
This isn’t declarative data modeling. It’s Makefiles for database objects.
✅ Why This Matters: Correctness
Inconsistency Is Inevitable
No Input Freezing
dbt doesn’t freeze input state while its long-running DAGs execute.
If source tables change mid-run, downstream models see inconsistent inputs.
You, the engineer, must ensure upstream inputs are stable.
And if your project mixes views and tables?
- Views reflect live data immediately.
- Tables only change when you re-run the model.
Your data is now split-brain.
Partial DAG Runs Are Dangerous
Because full DAG runs are slow, teams often use selectors to run subgraphs.
But this only works if you track what changed.
Declarative systems should know this. dbt doesn’t.
Incremental Models Are a Time Bomb
Incremental models compile to INSERT
statements that depend on the current state of the target table.
If your inputs are changing, you get inconsistent rows inserted at different times based on different upstream states.
This is operational debt waiting to explode.
🧠 Root Problem
dbt doesn’t protect you from yourself.
A truly declarative system would:
- Freeze inputs during execution
- Prevent partial runs from corrupting state
- Make incremental models safe by default
But dbt isn’t built that way.
🚨 The Consequence: Data Drift
Databases are long-lived. And drift accumulates.
Testing in dbt is light, optional, and post-hoc.
After a few months, your database is a mess.
You will have published wrong data.
🔒 Lack of Safety
Here’s how dbt handles failure:
- Replaces tables and views while running.
- Runs tests after changes are made.
- If tests fail, the run halts.
What’s left?
Partially updated state. And no rollback.
dbt doesn’t back up the previous state. Once tables are overwritten — they’re gone.
- Can you guarantee no one is reading those broken tables?
- Can you roll back to last known good state?
No. Because dbt already destroyed it. And not even time travel can save you.
✅ What Safety Should Look Like
- Keep the last-known-good objects live.
- Build new models in isolation.
- Test before promotion.
- Promote only if all tests pass.
This is standard in software deployment.
Why should data be any different?
🚀 Why This Matters: Velocity
2am Debugging Isn’t Velocity
You get paged:
- It’s 2AM.
- dbt tests failed.
- Your DAG halted.
- Your warehouse is in a broken state.
- Dashboards will break by 8AM.
You scramble.
But you don’t know why it failed.
- Bad upstream data?
- Schema change?
- Logic bug?
You tweak some code. Nudge an input. Re-run.
You wait.
Maybe it works in 2–3 hours. Maybe not.
This is the opposite of velocity.
Only Experts Can Fix It
Most of your team can’t fix these issues.
Only your most senior engineers know enough of the system’s quirks to debug it quickly.
Everyone else is stuck guessing.
This isn’t sustainable.
Long Iteration Loops
dbt re-runs models even if:
- Inputs haven’t changed.
- Code hasn’t changed.
- Outputs are still correct.
This leads to slow, wasteful cycles:
- Change a model.
- Run dbt (wait).
- See a bug.
- Fix it.
- Run dbt (wait).
Repeat.
🧠 Why Should Humans Track This?
Computers should know what changed. Humans shouldn’t guess.
A true declarative system would:
- Track input and code changes.
- Skip unnecessary work.
- Rebuild only the required parts.
🔁 Why This Matters: Reproducibility
No Guaranteed Rebuilds
dbt can’t guarantee that the same inputs and code will produce the same output.
Why?
Because dbt builds an imperative DAG that mutates state.
Your models:
- Depend on live upstream data.
- Depend on intermediate tables from prior runs.
- Perform incremental
INSERT
s based on runtime state.
This means:
- Full rebuilds may differ from previous ones.
- Incremental logic can’t verify prior state.
- Partial runs can silently corrupt results.
Reproducibility becomes best effort — not guaranteed.
No Artifact-Based State Management
True reproducibility means versioning:
- Inputs
- Execution context
- Transformations
- Outputs
- Execution history
dbt doesn’t do this.
- No input versioning.
- No persistent intermediate results.
- No fine-grained lineage.
✅ What Reproducibility Should Look Like
- Versioned, immutable inputs.
- Deterministic transformations.
- Versioned, immutable outputs.
- Artifact-based history.
You don’t overwrite old data until you know the new version is good.
You promote a proven artifact — you don’t re-run and hope.
You get fast, safe rollbacks and clear diffs.
That’s reproducibility.
🧱 The Real Problem Is the Foundation
None of this is dbt’s fault.
It’s a great tool. It brought data-as-code into the mainstream.
But dbt is built on top of mutable, side-effecting SQL databases.
And that’s the real problem.
You can’t build a declarative system on top of an imperative foundation.
If:
- Your storage engine mutates state…
- Your transformations overwrite prior results…
- Your lineage is inferred post-hoc…
- Your outputs are side-effected blobs…
Then:
Everything built on top of that will eventually rot.
That’s why we’re building AprioriDB:
- Append-only storage
- Cell-level versioning
- Branching + merging
- Fully declarative semantics
Not just a better dbt. A better foundation.
🔮 The Future Is Declarative — All The Way Down
- If you’ve been on-call for a broken dbt run…
- If you’ve lost sleep debugging DAGs…
- If your pipelines feel fragile, slow, and brittle…
You’re not crazy.
This is what happens when we fake declarative workflows on top of imperative engines.
We believe the future of data:
- Isn’t DAGs of mutations.
- Isn’t pipelines you have to babysit.
- Is versioned, immutable, reproducible — by default.
Declarative all the way down.
📣 We’re looking for early adopters, collaborators, and people who feel the pain we’ve felt. If that’s you — we’d love to hear from you.
Written by Jenny Kwan, co-founder and CTO of AprioriDB.
Follow me on Bluesky and LinkedIn.