Why Table-Level Permissions Aren't Enough for AI Agents

When most data teams think about access control for AI agents, they think about it the same way they think about access control for human analysts: grant access to the tables the agent needs, restrict the tables it doesn't. It's a familiar model, it's easy to implement, and it's almost entirely inadequate for agents.

The problem isn't that table-level access is wrong — it's that it's far too coarse. A human analyst who has access to a customer table understands, implicitly, that certain columns contain PII that shouldn't be included in public-facing reports. They've been through compliance training. They know the rules. An AI agent has no such context. Given table-level access to a customers table, it will happily include email addresses, phone numbers, and health status fields in query results unless something explicitly prevents it from doing so.

Building appropriate access control for agentic systems requires thinking at four levels: table, schema, column, and row. Most organizations are operating at level one. This article explains what each level requires, what the risk profile looks like when you're missing a level, and how to build toward the fine-grained model that agentic workloads need.

The 4 Levels of Access Control

Level 1

High

Table-Level Access

Table-level access grants or denies access to an entire table. This is the default model in most data warehouses and the model most teams use for both humans and agents. The risk for agents is significant: a single table often contains a mix of safe and sensitive columns. Granting table access to an agent effectively grants access to all columns in that table, including PII, financial data, and any other sensitive fields that weren't explicitly excluded. Agents will use what's available.

Level 2

High

Schema-Level Access

Schema-level access controls which schemas (groups of tables) an agent can query. This is slightly more granular than table-level access if your schemas are organized by sensitivity (e.g., a raw schema vs. a curated schema vs. a PII schema). But in practice, most schemas are organized by domain or source rather than sensitivity level, which means schema-level access provides little additional protection over table-level access.

Level 3

Medium

Column-Level Access

Column-level access specifies which columns within a table an agent can read. This is the level at which PII protection becomes meaningful: you can grant access to a customers table but restrict the email, phone, ssn, and health_status columns. Snowflake, BigQuery, and Databricks all support column-level security natively. The operational challenge is maintaining the column access list as schemas evolve — a new column added to a table doesn't automatically inherit the right access policy.

Level 4

Low

Row + Column Access

Row-level access combines column-level restrictions with row-level filters, ensuring an agent can only see rows that match specific criteria. This is the appropriate model for multi-tenant data, regional data residency requirements, or cases where agents should only access data for specific customer segments. Row-level security in Snowflake is implemented via row access policies; in BigQuery via row-level security filters. Combined with column-level access, this provides the most robust protection for agentic workloads.

PII Tagging: What It Is and How to Implement It

PII tagging is the practice of labeling columns that contain personally identifiable information so that access policies, masking rules, and audit trails can be applied automatically. Without PII tags, enforcing column-level access requires manually identifying and listing every sensitive column — a process that doesn't scale and breaks the moment new columns are added.

In dbt, PII tagging is implemented using column-level tags in your schema.yml files. A column tagged as pii or sensitive can be referenced by downstream access policies and documentation generators. In Atlan and DataHub, PII classification can be applied automatically using pattern matching and ML classifiers that scan column names and sample data to identify likely sensitive fields. This automated approach is important at scale — manually reviewing thousands of columns is not feasible.

For agents specifically, PII tags serve a dual purpose: they inform access control decisions (the agent doesn't get access to PII-tagged columns) and they inform the agent's own behavior (if an agent can read PII tags through the semantic layer, it can make better decisions about what to include in query results and how to handle sensitive data in responses). A fully mature implementation has both: PII-tagged columns are inaccessible to agents that don't have explicit PII access, and agents that do have access know to treat those columns accordingly.

Metric-Level Permissions: Access Without Table Exposure

One of the most powerful features of semantic layer tools like Cube and Looker is their ability to grant access to a metric without exposing the underlying table. An agent can query the monthly_revenue measure and receive the correct aggregated value without ever seeing the orders table, the amount_usd column, or any of the raw transaction data.

In LookML, this is implemented through access grants: you define which user attributes are required to access a specific explore, view, or measure. An agent service account can have access to specific measures without having access to the underlying views those measures are derived from. In Cube, access control is implemented at the cube definition level, with role-based access that can be scoped to specific measures and dimensions.

Metric-level access is the gold standard for agentic access because it enforces the principle of least privilege at exactly the right level of abstraction. The agent gets the answer it needs (the aggregated metric value) without the ability to do things it shouldn't (explore raw customer data, join tables in unexpected ways, or access sensitive columns that happen to be in the same table as the data it needs).

Human vs. Agent Access: Service Accounts, Scoped Tokens, and Separate Audit Trails

Agents should never share credentials with humans. This sounds obvious but is frequently violated in practice — teams connect agents using a developer's personal access token, or grant agents the same warehouse role as the analytics team, because it's faster. The operational cost of this convenience is the loss of auditability: you can no longer distinguish agent queries from human queries in your logs, you can't apply different access policies to agents, and you can't revoke agent access without affecting human access.

The correct approach is to provision dedicated service accounts for each agent (or class of agent), with scoped credentials that have only the access the agent needs. In Snowflake, this means a dedicated role and a dedicated service user. In BigQuery, a dedicated service account. Scoped API tokens should be short-lived (hours or days, not months) and rotated automatically using a secrets manager like AWS Secrets Manager, HashiCorp Vault, or the equivalent in your cloud provider.

With separate service accounts, audit trails become meaningful. You can query your warehouse's query history filtered to the agent's service account and see exactly what queries the agent ran, at what times, and against what data. This is the foundation for the audit capability that compliance and security teams will eventually require, and it's much easier to build from the start than to retrofit after agents have been running on shared credentials for months.

What "Full Automated Audit Log" Actually Looks Like

An audit log for agent queries needs to capture more than just "the agent ran this SQL." A complete audit record includes: the natural language question the agent was asked, the SQL it generated, the semantic layer objects it accessed (metrics, dimensions, filters applied), the result set returned (or a hash of it), the user who initiated the query, the timestamp, and the latency. This creates a full chain from intent to result that can be reviewed after the fact.

In practice, most warehouse-level audit logs only capture the SQL and metadata. Capturing the full chain requires application-level logging in your agent framework. LangChain, LlamaIndex, and similar frameworks have callback hooks that can be used to log the full lifecycle of an agent query to a structured store (a database table, a log aggregation system like Datadog, or a dedicated observability tool like Arize).

The audit log should be queryable independently of the agent. If an incident occurs — a data breach, an incorrect report sent to a regulator, a privacy complaint — you need to be able to reconstruct what the agent did and why without relying on the agent itself to explain its behavior. A separate, immutable log is the only way to guarantee this.

The PII Join Problem: How Agents Inadvertently Expose Personal Data

Column-level security prevents direct access to PII columns. But it doesn't prevent a class of more subtle exposure: the inference attack via join. An agent that can query both an orders table (containing customer_id and purchase amounts) and a customers table (containing customer_id and address) can join these tables together even if it's not supposed to be doing customer-level analysis. The resulting query exposes addresses alongside purchase behavior — a PII join that circumvents the protection on either individual table.

Preventing PII joins requires either restricting the agent to a semantic layer that doesn't expose the join paths between sensitive tables, or implementing join governance rules that block specific join combinations. Cube allows you to define which joins are exposed in a given cube definition — if you don't define the relationship between orders and customers at the cube level, the agent can't join them even if it has access to both tables at the warehouse level.

This is another argument for semantic-layer-first agent access. When agents must go through the semantic layer rather than directly querying warehouse tables, the semantic layer becomes the enforcement point for join governance, not just column access. The joins available to an agent are exactly the joins the semantic layer exposes — nothing more.

Building Toward Column-Level Access: A Migration Path

Moving from table-level to column-level access is a multi-step project, not a configuration change. The work happens in four phases: inventory, classification, policy implementation, and agent migration.

→Phase 1: Catalog every table and column an agent currently accesses. This is your attack surface. Understanding it is prerequisite to reducing it.
→Phase 2: Tag every column in the inventory as safe, sensitive, or PII. Use automated classifiers where possible to reduce manual effort. Treat any column with a name pattern matching email, phone, ssn, address, health, or similar as PII until proven otherwise.
→Phase 3: Implement column-level policies in your data warehouse for all columns classified as sensitive or PII. Test that agents can no longer access these columns directly. Document exceptions and the business justification for each.
→Phase 4: Redirect agents to query through the semantic layer for all metric access. This removes direct warehouse access and enforces the semantic layer as the single enforcement point for all column and join access policies.

This migration doesn't need to happen all at once. Start with the tables that contain the highest concentration of PII or financial data, apply column-level policies to those tables first, and expand from there. A partial implementation that protects your most sensitive data is significantly better than a perfect plan that never gets executed.

The Agentic Risk

A sales analysis agent is given table-level access to the accounts table to answer questions about pipeline and revenue. The table also contains a health_plan_type column populated from a healthcare integration. An employee asks the agent: "Show me our largest enterprise accounts and their key details." The agent returns a table that includes health plan type for each account contact — information that was never intended to be accessible to sales and almost certainly violates HIPAA.

No human analyst would have made this error — they would have known not to include that column. The agent had no such context. Column-level security on health_plan_type would have prevented the agent from accessing it regardless of how the query was phrased.

How does your access control score?

The Semantic Layer Readiness Scorecard assesses your access control granularity alongside four other dimensions of agentic readiness. Takes 5 minutes.

Take the Scorecard →