Your Metrics Haven't Changed. Why Is Your AI Re-deriving Them?

AOV, ROAS, churn rate, LTV. These calculations haven't changed in decades. So why is every AI agent re-deriving them from scratch on every query?

The most stable thing in your business

Think about what changes in your company every quarter. Your product roadmap shifts. Your team grows. Your tech stack evolves. Your go-to-market strategy pivots. Your competitive landscape reshapes.

Now think about what hasn't changed: how you calculate revenue.

Average Order Value is total sales divided by number of orders. It was calculated this way at your company's founding. It's calculated this way today. It will be calculated this way in ten years.

Return on Ad Spend is revenue attributed to a campaign divided by the cost of that campaign. This formula predates the internet.

Customer Lifetime Value is the total revenue a customer generates over their relationship with your company. The SQL might vary (some companies discount future value, some don't), but once your finance team defines it, that definition is stable for years.

Churn rate is customers lost divided by customers at the start of the period. This hasn't changed since the concept was invented.

These are not ambiguous. They are not open to interpretation. They are settled, tested, and agreed upon. Your data team defined them once. Your dashboards have been running them correctly for years.

So here is the question nobody seems to be asking: why is your AI agent re-deriving these calculations from scratch every single time someone asks a question?

What re-derivation looks like

When an AI agent uses context engineering to answer "what's our AOV this quarter," here is what actually happens:

  1. The system retrieves your database schema (tables, columns, types)
  2. It injects column descriptions, documentation, maybe example queries
  3. The LLM reads all of this context and reasons about what "AOV" means
  4. It generates SQL that it believes will calculate AOV
  5. The SQL executes against your database
  6. A number comes back

Steps 1 through 4 are the re-derivation. The model is starting from raw schema information and working its way to a SQL query that computes average order value. Every single time.

This is equivalent to handing a new analyst your database credentials and saying "figure out how to calculate AOV" on every question, every day, forever. No institutional knowledge. No established definitions. Just raw inference from schema context.

Your company already solved this problem. The AOV calculation lives in a dashboard, a dbt model, a SQL file, a Looker explore, or a spreadsheet. Someone wrote it, tested it, validated it against known numbers, and it's been running correctly ever since.

The AI agent ignores all of that and starts from scratch.

The cost of re-derivation

Inconsistency

Ask the agent "what's our AOV" three times. You might get three slightly different SQL queries. Maybe one includes cancelled orders. Maybe another filters to a specific product line because the schema was ambiguous. Maybe the third rounds differently.

Your dashboard shows one number: $47.23. It's been $47.23 all day. The agent gives you $44.10, $49.87, and $47.23. One of them is right. You don't know which one without checking the dashboard, which defeats the purpose of asking the agent.

Wasted compute

Every re-derivation costs tokens. The schema injection, the documentation retrieval, the few-shot examples, the chain-of-thought reasoning about how to construct the query. For a complex warehouse with hundreds of tables, this can be thousands of tokens per question.

Meanwhile, the answer is a 3-line SQL query that's been written and tested. You're spending dollars on inference to avoid looking up something that already exists.

Institutional knowledge loss

Your data team spent months defining what "revenue" means at your company. Does it include refunds? Pending transactions? Which product lines? What about credits?

These decisions are encoded in the SQL that powers your dashboards. They represent real business logic, debated in meetings, validated against financial reports, and agreed upon by stakeholders.

When an agent re-derives "revenue" from schema context, it loses all of that institutional knowledge. It doesn't know about the refund policy change in Q3. It doesn't know that one product line reports in EUR and needs currency conversion. It doesn't know that "status = 2" means "paid" because a developer chose that enum value four years ago.

Context engineering tries to solve this by injecting more documentation. But documentation is always incomplete, often outdated, and fundamentally disconnected from the actual SQL that runs in production.

Unauditable decisions

When the CFO asks "where did this revenue number come from?" you need an answer. With a dashboard, the answer is clear: it came from this SQL query, which was last modified on this date, by this person, and validated against the Q3 financial close.

With an AI agent re-deriving the query, the answer is: the model generated SQL based on schema context and documentation. The specific SQL was never reviewed by a human. It was produced by probabilistic inference. The same question tomorrow might produce different SQL.

No CFO wants to hear that.

Why the industry defaults to re-derivation

If re-derivation is so clearly wasteful, why does everyone do it?

It demos well

Text-to-SQL is a spectacular demo. Ask a natural language question, watch SQL appear, see results. It feels like magic. Investors love it. Product managers love it. Conference audiences love it.

Nobody demos the failure mode: the subtle JOIN error that produces a number that's close enough to look right but wrong enough to mislead a business decision.

It's general purpose

Context-based SQL generation can theoretically answer any question about any table. Pre-defined metrics can only answer questions about metrics that exist. Generality feels more valuable than reliability, at least until the first production incident.

The tooling makes it easy

Every AI framework, every RAG pipeline, every agent toolkit is built around the "inject context, generate output" pattern. Building a text-to-SQL agent takes an afternoon. Building a governed metrics layer takes a week.

The industry has optimized for speed of development, not reliability of output.

Nobody framed the alternative

Until recently, the conversation was entirely about making context engineering better. Better embeddings. Better retrieval. Better schema descriptions. Better few-shot examples.

The alternative, simply using the metric definitions that already exist, was so obvious that nobody thought to propose it as an architecture.

The metric definitions already exist

This is the part that makes re-derivation so absurd. The definitions your agent is trying to reconstruct already exist in your organization. They live in:

  • dbt models: Your data team has already defined metrics as tested, versioned SQL transformations
  • Dashboard queries: Every Looker explore, Tableau workbook, and Metabase question contains a curated SQL query
  • Spreadsheets: Your finance team has formulas that define exactly how revenue, margins, and growth rates are calculated
  • Data dictionaries: If your organization is mature enough to have one, the metric definitions are literally written down
  • Tribal knowledge: Your senior analyst knows that "revenue" means SUM(total_cents) / 100.0 WHERE status = 2 because they wrote that query three years ago

All of this institutional knowledge exists. It's tested. It's trusted. Your company makes real decisions based on it every day.

Context engineering tries to help the LLM re-derive this knowledge from schema metadata. That is like asking a new employee to reconstruct your entire business logic from the database schema, ignoring the documentation, dashboards, and domain experts sitting right next to them.

What happens when you stop re-deriving

The alternative is straightforward: take the metric definitions that already exist and make them directly accessible to agents.

metrics:
  - name: average_order_value
    description: Average order value across all paid orders
    sql: |
      SELECT SUM(total_cents) / COUNT(DISTINCT id) / 100.0
        AS aov_usd
      FROM orders WHERE status = 2
    tags: [revenue, sales]
    canonical_questions:
      - what is our AOV
      - average order value
      - typical order size

  - name: roas
    description: Return on ad spend by campaign
    sql: |
      SELECT campaign,
        SUM(revenue_cents) / 100.0 AS revenue,
        SUM(spend_cents) / 100.0 AS spend,
        CASE WHEN SUM(spend_cents) > 0
          THEN SUM(revenue_cents)::FLOAT / SUM(spend_cents)
          ELSE 0 END AS roas
      FROM campaigns
      GROUP BY campaign
    tags: [marketing, finance]
    canonical_questions:
      - return on ad spend
      - which campaigns are profitable
      - marketing ROI

These are the exact same calculations your dashboards use. The same SQL. The same logic. The same tested, validated, trusted definitions.

When an agent needs AOV, it doesn't re-derive it. It calls query_metric("average_order_value") and gets the number. The SQL is pre-defined. The result is deterministic. The answer matches your dashboard. Every time.

The agent still uses AI for what AI is good at: understanding the user's intent ("what's our typical order size" maps to the average_order_value metric), applying filters ("for Q3" becomes a time range filter), and reasoning over results ("AOV is $47.23, up 8% from Q2, driven by the enterprise tier").

The AI reasons. The metrics compute. Each layer does what it does best.

The stability argument

Here is the strongest case against re-derivation: metrics are the most stable artifact in your data stack.

Your application code changes daily. Your infrastructure changes monthly. Your data models change quarterly. But the formula for AOV? That hasn't changed since your company started selling things.

When something is this stable, the engineering response should be to codify it, not to re-derive it. You don't recalculate the value of pi on every API request. You don't re-derive your tax rates from first principles on every invoice. You look them up.

Metrics deserve the same treatment. They are solved problems. Treat them like solved problems.

But what about new questions?

The most common objection to pre-defined metrics is: "What happens when someone asks a question that doesn't match any existing metric?"

This is a valid concern, and the answer is revealing: the system tells them.

When an agent searches the metric catalog and doesn't find a match, it says: "I don't have a metric for that. Here are the closest metrics I do have. Would you like me to request this as a new metric?"

Compare this to the context engineering approach, where the agent would generate SQL for an undefined metric, potentially produce a wrong answer, and present it with full confidence.

Which failure mode would you prefer? An honest "I don't know" or a confident wrong answer?

The "I don't know" response also creates a feedback loop. Every unanswered question is a signal to the data team that a new metric is needed. Over time, the catalog grows to cover the questions people actually ask, not the questions someone imagined they might ask.

The path forward

If you're building AI agents that access data, here's the practical path:

  1. Inventory your existing metrics. Look at your dashboards, dbt models, recurring reports. The definitions are already there.
  2. Extract them into a catalog. YAML, JSON, whatever format works. Name, description, SQL, tags.
  3. Make them accessible to agents. Through MCP, an API, or a direct integration.
  4. Let the feedback loop work. When agents can't find a metric, that's a signal to define a new one.

This is not a six-month project. If you have 20 dashboard queries, you can have a working metric catalog in an afternoon. The SQL already exists. You're just giving it a name and making it accessible.

Stop asking your AI to re-derive what your data team already solved. Your metrics haven't changed. Use them.

Don't help AI guess your data. Remove the guess.

OnlyMetrix turns your existing metric definitions into a deterministic data access layer for AI agents. Import from YAML, dbt, or define them inline. MCP-native. Try the beta.

Ready to try deterministic data access?

Define your first metric. Let agents query safely.

Get Started