What Does It Mean to Formally Define a Metric? (And Why It Matters for AI Agents)

Ask almost any data team whether they have defined metrics and you'll get a confident yes. Ask them to show you the grain, the filter logic, and the assigned owner for their top five metrics, and the confidence disappears. What most teams have isn't a metric definition — it's a shared understanding. And shared understandings fall apart the moment a new analyst joins, a dashboard is rebuilt, or an AI agent starts querying your data without the benefit of institutional knowledge.

The difference matters enormously right now. When humans work with metrics, they can ask clarifying questions, read surrounding context, and apply judgment. When an AI agent queries a metric, it can only work with what's explicitly codified. There is no ability to infer that "revenue" means recognized revenue for North America, excluding refunds, at a monthly grain. If that isn't written down somewhere a machine can read, the agent will compute something that looks like revenue but isn't.

This article walks through what formal metric definition actually requires. The five components that separate a true definition from a shared assumption are covered in detail, along with how governed semantic layer systems encode that definition in a form that both humans and agents can reliably use.

The Gap Between "Everyone Knows" and a Formal Definition

Shared understanding is surprisingly durable inside a stable team. When the same five analysts have been working on the same product for three years, "monthly active users" doesn't need to be written down. Everyone has absorbed the same edge cases through osmosis — that trial users don't count, that the 30-day window starts on the first of the month, that mobile and web sessions are unified. The definition lives in the heads of the people who built it.

The fragility of this arrangement only becomes visible under stress. When someone new joins and produces different numbers. When two dashboards in different tools show different monthly active user counts and nobody can figure out why. When an executive asks a natural language question to an AI assistant and gets a number that contradicts the dashboard they reviewed yesterday. Each of these is the same problem: the definition existed only in people's heads, not in the system.

A formal definition is one that can be read and executed by a system without human interpretation. That means every assumption that currently lives in someone's head needs to be made explicit: the grain of the underlying table, the filter conditions, the aggregation method, the owner responsible for maintaining accuracy, and the version history that lets you understand how the definition has changed over time.

The 5 Components of a Complete Metric Definition

Grain

Grain specifies the level of detail at which the underlying data is stored and at which the metric is computed. Is this a metric computed from an event-level table (one row per event), a session-level table (one row per session), or an account-level table (one row per account)? The grain determines what filters are valid, what joins are possible, and whether aggregation will produce double-counting. A revenue metric computed from an order-line table needs to aggregate to order before summing to avoid inflating revenue when the same order appears across multiple line items.

Filters

Filters define which rows are included and excluded. This is where most informal definitions fail. 'Revenue' sounds like a self-evident concept until you specify: does it include internal test transactions? Refunded orders? Orders in non-live markets? Subscription renewals or just new contracts? Every one of these is a filter condition that needs to be explicitly stated. In a formal definition, filters are written as SQL WHERE clause logic or an equivalent semantic layer expression — not as English prose, which is ambiguous and not machine-executable.

Aggregation Method

The aggregation method specifies how individual row values are combined: SUM, COUNT, COUNT DISTINCT, AVG, MEDIAN, or a more complex expression. This matters more than it seems. Revenue is a SUM, not an AVG. Monthly active users is a COUNT DISTINCT over user IDs, not a COUNT of events. A session-weighted average engagement score requires a different aggregation than a simple average. Specifying the wrong aggregation produces plausible-looking numbers that are systematically wrong — exactly the kind of error that's hardest to catch.

Owner

Every metric definition needs a designated owner: a person or team responsible for ensuring the definition is accurate, keeping it updated when the underlying data changes, and resolving disputes when two dashboards produce different numbers. Without an owner, metric definitions decay silently. The underlying tables change, the definition isn't updated, and the metric continues to run without error — producing wrong numbers that nobody notices because nobody is watching. Owner assignment is an organizational act, not a technical one, but it needs to be encoded in the system.

Version

Metric definitions change over time, and those changes need to be tracked. When a business decides to exclude a new category of internal transactions from revenue, that changes the number retroactively if the filter applies to historical data, or prospectively if it applies from the change date. Version history allows you to understand what a metric meant at a given point in time, reproduce historical analyses correctly, and communicate to stakeholders when a number changed because the definition changed — not because the business changed.

Why Informal Definitions Work for Humans But Break Agents

Human analysts are extraordinarily good at filling in gaps. When a dashboard shows "Q3 Revenue" with no further specification, a senior analyst can draw on years of context to know which revenue definition that dashboard uses, what its known quirks are, and when it tends to lag or lead against other revenue reports. This tacit knowledge is valuable and hard-won, but it is entirely non-transferable to a machine.

AI agents have no context inheritance. An agent querying your data warehouse has no idea that the orders table includes a test_account flag that should always be filtered out. It has no idea that amount_usd is nullable for orders placed before 2022 and that amount_local should be used for those records. It will write the most plausible SQL for the question it was asked, and if your data has undocumented edge cases, the SQL will be wrong in ways that are hard to detect.

This is the fundamental reason that semantic layers matter for agentic systems. A semantic layer is the mechanism for encoding all of that tacit knowledge — all those filters, grain specifications, and aggregation rules — into a form the agent can read before it generates a query. Without it, agents are effectively doing exploratory data analysis on every query, guessing at the right definition rather than executing a known one.

Governed Semantic Layer Systems: dbt Metrics, LookML, Cube, AtScale

Several tools now provide the infrastructure to encode formal metric definitions. Each takes a different approach to the problem and is better suited to different organizational contexts.

dbt Semantic Layer

Defines metrics in YAML alongside your transformation models. Metrics are version-controlled with your dbt project, tested with dbt's testing framework, and documented inline. The main advantage is that metric definitions live alongside the transformations that produce the underlying data, making inconsistency harder to introduce. The limitation is that dbt metrics are relatively new and the tooling ecosystem for querying them is still maturing.

LookML (Looker)

LookML's measure and dimension system is one of the most mature metric definition frameworks available. Every metric in LookML has an explicit type, an explicit SQL expression, an optional filter, and inherits the grain from its view definition. Looker's semantic layer has been queryable by agents via its API for years, making it one of the most agent-ready options available today.

Cube

Cube provides a semantic layer as standalone infrastructure, separate from both the transformation layer and the BI layer. Its measures and dimensions can be queried via a REST API, GraphQL, or SQL, making it particularly flexible for multi-tool environments where you want a single definition accessible by many different consumers. Cube also supports pre-aggregation, which is important for agent query tolerance.

AtScale

AtScale sits between your data warehouse and your BI tools, providing a universal semantic layer that multiple tools can query simultaneously. Its strength is in enterprises with heterogeneous BI environments where multiple tools need to share the same metric definitions without duplicating them.

The right choice depends on your stack and your starting point. If you're already using dbt, the dbt Semantic Layer is the natural place to start. If you're on Looker, LookML is already handling this. If you need a standalone semantic layer that works across multiple BI tools, Cube is worth evaluating.

Metric Ownership: Why It Matters and How to Assign It

Metric ownership is the organizational mechanism that keeps definitions accurate over time. Without ownership, even a perfectly specified metric definition will decay as the underlying tables change and nobody updates the specification. The owner is the person who gets paged when the metric starts producing unexpected values, who reviews proposed changes to the definition, and who signs off on new versions.

Assigning ownership is harder than it sounds because many metrics are genuinely cross-functional. Revenue is defined by finance but computed from data managed by engineering and surfaced in dashboards maintained by the analytics team. In practice, the best ownership model is to designate a primary owner (the person with decision-making authority over the definition) and a list of stakeholders (people who need to be consulted on changes). The primary owner doesn't need to be the person who maintains the code — they need to be the person who can authoritatively answer "is this the right number?"

In dbt, ownership is encoded using the owner meta field on metric definitions. In Looker, it's a label or description field. In Cube, it's a metadata field on the measure. The specific mechanism matters less than the practice: every metric definition should have a named human who is responsible for its accuracy.

The Single Source of Truth Problem: What Creates Duplicates and How to Eliminate Them

Metric duplication is endemic in analytics organizations. It happens gradually and always for good reasons: an analyst needs a slightly different version of revenue for a new analysis, so they create a new calculated field rather than modifying the shared one. A second team doesn't know the first team's metric exists and builds their own. A Tableau workbook defines a metric slightly differently from the Looker dashboard that's supposed to show the same number. Over time, five different systems each have their own definition of the same concept, and they all produce different numbers.

The root cause is almost always the absence of a discovery mechanism. If an analyst can't quickly find an existing metric definition that meets their needs, they'll build a new one. The solution isn't enforcement — it's discoverability. A metric catalog (whether that's dbt docs, Looker's field browser, or a standalone tool like Atlan or DataHub) makes existing definitions findable before someone creates a duplicate.

Eliminating existing duplicates requires a deliberate rationalization pass: enumerate all metric definitions across all tools, identify semantic overlaps, decide which definition is authoritative, and deprecate the others. This is organizational work, not technical work. The technical piece (pointing everything at the authoritative definition) is usually much simpler than the process of deciding which definition is authoritative and getting all stakeholders to agree.

How to Audit Your Current State: A Step-by-Step Approach

Before you can improve metric definition coverage, you need to know where you stand. A metric definition audit has four steps: inventory, classification, gap analysis, and prioritization.

→Inventory: List every metric used in production dashboards and reports. Include the tool it's defined in, the underlying data source, and the last time someone verified the definition.
→Classification: For each metric, assess which of the five components (grain, filters, aggregation, owner, version) are explicitly documented. A metric that has all five is formally defined. A metric missing any of them is informally defined, regardless of how widely it's used.
→Gap analysis: Identify the most critical gaps. Revenue metrics with missing filter documentation are higher risk than internal operational metrics with missing owner assignment. Rank gaps by the impact of a definition error.
→Prioritization: Start with the metrics that are most frequently queried, most often the subject of disputes, or most likely to be accessed by AI agents in the near term. You don't need to formally define every metric before deploying agents — you need to formally define the ones agents will actually query.

The output of this audit is a coverage score: what percentage of your production metrics have all five components formally defined. This is the core metric for Dimension 1 of the Semantic Layer Readiness Scorecard.

The Agentic Risk

An executive asks your AI analytics agent: "What was our revenue last quarter?" The agent queries the orders table, sums the amount_usd column, and returns $14.2M. The finance team's dashboard shows $12.8M. The difference: the agent didn't know to exclude test accounts, internal orders, and refunds — three filter conditions that exist as tribal knowledge but aren't encoded anywhere in the data.

This is not a failure of the AI system. The AI did exactly what it was asked with the information it had. It's a failure of metric definition. When your metrics are formally defined in a semantic layer the agent can read, the agent queries the semantic layer's revenue metric — with all filters applied — instead of constructing its own definition from scratch. The number matches the dashboard because both are reading the same definition.

See how your metric definitions score

The Semantic Layer Readiness Scorecard assesses all five dimensions of agentic readiness, including metric definition coverage. Takes 5 minutes.

Take the Scorecard →