Semantic Layer·8 min read·Mar 24, 2026

Is Your Data Stack Built for 500 Queries in 10 Minutes? Designing for Agentic Query Patterns

Human analysts run 5–10 queries per session. Agents run hundreds. Your warehouse cost governance and schema stability were not designed for this.

Every data warehouse, semantic layer tool, and BI platform was designed around a human usage model. Human analysts are thoughtful, deliberate, and slow. They run a few queries per session, they read results before running the next query, and they work business hours in a single time zone. Your cost monitoring, concurrency limits, and schema stability assumptions were all calibrated for this.

AI agents are none of these things. A single agent can run hundreds of queries in minutes, operate continuously around the clock, and generate query patterns that no human would ever produce: broad exploratory queries, repeated queries with small parameter variations, and multi-step chains where each query's result feeds the next. The infrastructure that worked perfectly for your human analysts may be completely inadequate for agents, and the failures won't look like errors. They'll look like unexpected cost spikes, degraded performance for human users, and subtly wrong results that nobody catches.

This article covers the four areas where agentic query patterns create new requirements: schema abstraction, rate limiting and cost governance, semantic layer API stability, and query observability.

How Agentic Query Patterns Differ from Human BI Usage

The differences between human and agentic query behavior are significant enough that they require different infrastructure assumptions. Understanding the contrast is the starting point for designing tolerant systems.

Frequency
Human:5–10 queries/session
Agent:100–1,000 queries/session
Timing
Human:Business hours, deliberate
Agent:Continuous, 24/7
Query scope
Human:Focused, known datasets
Agent:Exploratory, broad
Cost per query
Human:Predictable, human-reviewed
Agent:Unpredictable, automated
Error handling
Human:Human sees error, stops
Agent:May retry in loop
Schema sensitivity
Human:Adapts to changes
Agent:Breaks silently

The most dangerous difference is error handling. A human who encounters a bad query result stops and investigates. An agent in a poorly designed system may interpret an error response as "no data" and continue to the next step, or may enter a retry loop that hammers the warehouse with identical failing queries. Without rate limiting and circuit breakers, a single malfunctioning agent can degrade performance for everyone.

Schema Abstraction: Why Agents Need a Semantic Layer, Not Raw Table Access

When an agent queries raw warehouse tables directly, it's exposed to every schema change, every table rename, and every column modification that happens in the underlying data. A dbt migration that renames amount to amount_usd will break every agent query that references the old column name. The agent has no way to adapt. It will continue using the old column name until a human notices the errors and updates the agent's configuration.

A semantic layer provides schema abstraction: the agent queries logical metrics and dimensions (stable, business-friendly names like revenue or customer_segment) rather than physical table columns (fragile, technical names that change frequently). When the underlying physical column changes, only the semantic layer definition needs to be updated. Every agent query continues to work because it's querying the stable logical name.

This abstraction benefit is compounded when multiple agents query the same metrics. If five different agents each query the warehouse directly for revenue, each needs to be updated when the revenue calculation changes. If all five agents query the semantic layer's revenue metric, the change is made once in the semantic layer definition and all five agents automatically pick up the updated calculation. The semantic layer is the single point of change management.

Semantic Layer Tools as Abstraction: Cube, LookML, and dbt Metrics

Each major semantic layer tool provides abstraction in a slightly different way, with different trade-offs for agent accessibility. Understanding the agent-specific capabilities of each tool is important for evaluating fit.

Cube

Cube is designed as a standalone semantic layer specifically for programmatic access. Its REST API, GraphQL API, and SQL API make it the most agent-friendly option available. Agents can query Cube using familiar SQL syntax while getting all the benefits of semantic abstraction. Cube translates the agent's SQL into optimized warehouse queries, applies pre-aggregations for common patterns, and returns results without the agent ever touching the underlying tables. Cube also supports pre-aggregation (materialized views of common query patterns), which dramatically reduces cost for high-frequency agent queries.

LookML (Looker)

Looker's semantic layer is queryable by agents via the Looker API, which supports both query creation (generating queries against LookML explores) and data retrieval. The Looker API is mature and well-documented, and many AI/agent frameworks have existing Looker integrations. The limitation for agents is that LookML explores can be complex to navigate programmatically: the agent needs to understand the explore structure to construct valid queries, which requires either careful documentation or a wrapper layer.

dbt Semantic Layer

dbt's semantic layer is queryable via the dbt Cloud Semantic Layer API, which supports JDBC and REST interfaces. dbt Semantic Layer queries use a custom query language (MetricFlow) rather than SQL, which provides strong semantic guarantees but requires agents to generate MetricFlow queries rather than SQL. The main advantage for agent use cases is the tight integration with dbt's transformation layer: the same project that defines your transformations also defines your metrics, and changes to both are version-controlled together.

Rate Limiting and Cost Governance: Snowflake, BigQuery, and Databricks

Without rate limiting, a single agent in a query loop can consume the warehouse compute equivalent of your entire analytics team's monthly usage in a few hours. This is not hypothetical. It has happened to organizations that deployed agents without cost governance. The fix is unglamorous but critical: every agent service account needs explicit resource limits that prevent runaway queries from affecting cost or performance.

In Snowflake, resource monitors allow you to set credit usage limits on warehouses and suspend them automatically when limits are reached. Dedicate a separate warehouse for agent queries (sized appropriately for the expected load) and set a monthly credit limit that you're comfortable with. When the limit is hit, the warehouse suspends and sends an alert: agent queries stop, and human queries on separate warehouses continue unaffected. In BigQuery, project-level quotas and slot reservations provide similar protection. In Databricks, cluster policies and job cluster limits control compute usage.

Rate limiting should also be applied at the semantic layer level, not just the warehouse level. If your agents query through Cube or the dbt Semantic Layer, configure rate limits there as well. Semantic layer rate limits provide a faster response to runaway queries (they reject requests before they reach the warehouse) and can be more granular: you can limit specific agent service accounts independently rather than applying limits to the entire agent warehouse.

Stable APIs and Versioning: What "Fully Stable + Versioned" Actually Means

A stable semantic layer API is one that agents can rely on not to change unexpectedly. This sounds simple but requires deliberate work: API versioning, a deprecation process, and a commitment to backward compatibility within a version. Without these, a semantic layer upgrade can break every agent that queries it simultaneously. Because agents run continuously, the breakage may happen in the middle of a critical business process.

Versioning for semantic layer APIs means that the URL or endpoint for an API includes a version identifier (e.g., /api/v2/metrics), and that breaking changes are only introduced in new versions while old versions remain available for a defined deprecation period. This allows agents to be updated to the new API version on a controlled schedule rather than being forced to update simultaneously with the API change.

Beyond API versioning, metric definition stability matters. A metric that changes its calculation without a version bump will silently change agent outputs. The solution is to treat metric definitions as versioned contracts: every change to a metric definition increments its version, and agents can be configured to use a specific version of a metric definition, insulating them from changes until they've been explicitly updated to use the new version.

Query Observability: Logging, Dashboards, and Real-Time Alerting

You can't govern what you can't see. Query observability for agentic systems means having a real-time view of what queries agents are running, how much they cost, how long they take, and whether they're succeeding or failing. This is the operational foundation for catching runaway queries before they become cost incidents, identifying inefficient query patterns before they compound, and demonstrating responsible AI data use to stakeholders.

At the warehouse level, Snowflake's QUERY_HISTORY view, BigQuery's INFORMATION_SCHEMA.JOBS, and Databricks' query history tables provide the raw data for observability. Tools like Snowflake's Query Profile, the open-source dbt Artifacts, or commercial tools like re_data and Monte Carlo build dashboards on top of these raw tables. For agent-specific observability, you'll want dashboards filtered to your agent service accounts showing: queries per hour, average cost per query, P95 latency, and error rate.

Real-time alerting is the critical piece that most teams build last and should build first. A Slack alert or PagerDuty notification when an agent's hourly query count exceeds a threshold, when a single query exceeds a cost threshold, or when an agent's error rate rises above a baseline allows you to respond to incidents while they're still small. Without alerting, you discover problems in your AWS bill at the end of the month, which is exactly the wrong time.

The 500-Query Scenario Across Maturity Tiers

What actually happens when an agent runs 500 queries in 10 minutes depends entirely on your infrastructure maturity. The difference between the scenarios below is not the agent's behavior. It's the infrastructure that surrounds it.

No infrastructure

500 queries hit the warehouse directly on a shared role. Performance degrades for all users. Cost spikes are not detected until the monthly bill. Some queries produce wrong results because an upstream schema change happened mid-run. Nobody knows what the agent queried or why.

Basic rate limiting

500 queries hit the warehouse on a dedicated agent warehouse with a credit limit. When the limit is hit, the agent warehouse suspends and an alert fires. Human users are unaffected. Cost is contained. But there's still no visibility into which queries failed, and schema changes can still break agent queries silently.

Semantic layer + observability

500 queries go through the semantic layer. Pre-aggregations serve common query patterns without hitting the warehouse. Rate limits at the semantic layer reject queries that exceed per-minute thresholds. Observability dashboards show real-time query volume and cost. Schema changes are absorbed by the semantic layer without breaking agent queries. All 500 queries are logged with the intent, the generated SQL, and the result.

The Cost Incident

A sales operations agent is asked to generate a market analysis report. The agent interprets the request as requiring a broad exploration of the customer database, running 800 queries over 20 minutes against raw warehouse tables with no rate limiting. Each query scans 500GB of data. The total scan: 400TB. The total Snowflake cost: $2,800, for a single request that a human analyst would have completed with 3 targeted queries totaling 5GB.

A semantic layer with pre-aggregations would have served all 800 conceptual queries from cached aggregates, at a total warehouse cost near zero. A rate limiter would have halted the run after 50 queries and sent an alert. Neither was in place. The incident was discovered at month-end billing review.

How query-tolerant is your stack?

The Semantic Layer Readiness Scorecard assesses your agent query tolerance alongside four other dimensions of agentic readiness. Takes 5 minutes.

Take the Scorecard →
Justin Leu

Justin Leu

Data & BI Consultant · San Francisco

17+ years helping companies like Google, Pinterest, Salesforce, and United Healthgroup turn raw data into actionable business intelligence. I write about BI strategy, data infrastructure, and the practical side of analytics.

Work with me →