Agent-first GRC: when AI runs the compliance program

For three years, every GRC vendor has been adding "AI features." The pattern is familiar: a chat sidebar that answers questions about your controls, a button that drafts a policy section, a summary that condenses an evidence file into a paragraph. These are useful. They are not what we mean by agent-first GRC.

Agent-first GRC is a different architecture. The compliance program is operated by agents — software that plans work, executes it across your tools, and surfaces decisions for human approval. The human's job shifts from typing answers and chasing evidence to reviewing what the agent did and deciding what should ship. The work product looks the same. The motion is completely different.

This post is about what changes when that shift is real, what the design constraints are, and how to tell the difference between agent-first GRC and AI-feature GRC.

The four motions of a compliance program

Most compliance programs run on four core motions, regardless of framework:

Authoring. Policies, narratives, standards, runbooks. Things you write once and keep current.
Evidence operations. Pulling configuration state, screenshots, logs, attestations on a regular cadence and storing them where auditors can find them.
Assessment work. Mapping controls, answering questionnaires, walking through processes, reviewing vendors.
Approvals and oversight. Decisions a human has to make and own.

Legacy GRC platforms automated parts of motions 2 and 3. AI-feature GRC sped up parts of motion 1 and 3 with chat. Agent-first GRC puts agents in charge of all four, with humans gating the parts where judgment matters.

That last bit is the load-bearing claim. It only works if the agent infrastructure can be trusted with the work — which is where most of the engineering effort goes.

What "agent" has to mean

The word "agent" has been worn smooth by overuse. For our purposes, an agent has four properties that distinguish it from a chatbot or a workflow automation:

Plans before acting. Given a goal, the agent produces a plan with discrete steps and the tools it will use. Plans are inspectable artifacts, not opaque traces.
Executes in discrete, observable steps. Each step is a unit of work with inputs, outputs, and a logged result. Steps can be replayed, retried, or rolled back independently.
Uses tools, not just generates text. Real agents call real APIs — your cloud, your ticketing system, your evidence store. The model is the planner and the writer; the tools are the hands.
Surfaces decisions for human approval. Anything that touches the outside world, costs money, or makes a binding claim is gated on human approval by default.

A "Draft policy" button that just calls a model and pastes the output is not an agent. A workflow that scrapes config evidence on a schedule is not an agent. The combination — a system that plans, executes, uses tools, and surfaces approvals — is.

What changes for the operator

The day-to-day experience of running a compliance program changes in three concrete ways.

You go from author to reviewer. Before agents, the GRC operator writes the first draft of everything: policies, narratives, questionnaire answers, risk descriptions, treatment plans. The first draft is the expensive part. With agents, the first draft arrives in your inbox. Your job is to read it, push back on the parts that are wrong, and approve the parts that are right. Reviewing well is a different skill than writing from scratch, and it requires fluency you build by reviewing a lot.

Your role becomes designing the program, not running the program. Agents are good at executing well-defined work. They are bad at defining what work is well-defined. The operator's job moves up a level: deciding what the program covers, what counts as evidence, what risk appetite is, which frameworks the company should pursue, which vendors are worth a deep review. The agent runs the program inside the rails you set.

Auditing the program becomes auditing the agent's work product. Auditors don't audit your team's brain. They audit the artifacts your program produced and the trail of how they got produced. Agent-first GRC actually makes this easier: every plan, every step-run, every approval, every tool call is logged by default. The audit trail is a side effect of the architecture, not something you assemble in the two weeks before fieldwork.

What changes for the auditor

The other side of the chair changes too. Auditors increasingly want to know three things about AI in your compliance program:

What is the AI authoritative on? If a policy says "we encrypt customer data at rest," who or what asserted that claim, and what evidence supports it?
Is the evidence collection reliable? If an LLM is reading your cloud configuration and reporting back, can you reproduce that read tomorrow and get the same result?
Where are the human approvals? Show me the decisions humans signed off on, and the work the agent shipped without an approval.

Agent-first platforms can answer all three. Plans are inspectable. Approvals are logged. Evidence collection runs on deterministic recipes — code, not models — that the auditor can read end-to-end. The model authored the recipe; the recipe authored the evidence; the audit trail captures both.

This matters because it dissolves the "I can't audit AI" objection. The auditor isn't auditing model outputs. They're auditing recipes, logs, and human decisions — all of which are the kind of artifacts auditors already know how to audit.

What gets harder

Agent-first GRC isn't strictly easier than legacy GRC. It trades one set of problems for another.

The cold-start problem is real. An agent that doesn't yet understand your environment, your appetite, or your tone makes worse first drafts than your team does. The first month of agent-first operation feels slower because you're teaching the agent rather than getting work done. After that month, the slope flips.

Approval fatigue is a failure mode. If every step needs human approval, the human becomes the bottleneck and the agent's speed advantage disappears. Designing approval thresholds — what to gate, what to batch, what to let ride — is a meaningful operational discipline.

Trust calibration takes time. New operators tend to either over-trust the agent (rubber-stamping approvals) or under-trust it (rewriting everything from scratch). Both failure modes are common and both undermine the value. Calibrated trust — knowing when to lean in and when to push back — is a skill you develop by reviewing a lot of agent output and watching what happens.

How to tell the difference

If you're evaluating a GRC vendor that claims to be AI-first or agent-first, here are the questions that separate architecture from marketing:

Show me a plan. When the agent does something non-trivial, can I see the plan before it runs? Or is it a black box that returns an answer?
Show me the step log. For the last thing the agent did, can I see every step, every tool call, every input and output?
Show me what runs without AI. If the answer is "everything runs through a model on every cycle," the evidence quality is going to be unstable. If the answer includes recipes, scripts, or deterministic code paths authored by the agent and executed without it, that's the architecture you want.
Show me the approvals. What requires a human approval today, and what doesn't? Can I change those thresholds?
Show me an audit trail of an agent run. Could I give this to an auditor tomorrow as evidence the program is being operated correctly?

If a vendor can't show you these, they have AI features, not agents.

What this means if you're running a program

If you're a GRC operator deciding whether to invest the calendar to shift from AI-assisted to agent-first, the honest answer is: it depends on where your time goes today.

If you spend most of your week typing — drafting policies, answering questionnaires, writing narratives — the leverage is large and the payoff is fast. If you spend most of your week thinking — designing programs, navigating regulatory ambiguity, calibrating risk — the leverage is smaller, and the payoff is in scope expansion rather than time saved. You'll cover more ground than your headcount could before.

Either way, the trend isn't going back. Auditors are starting to expect this architecture. Buyers are starting to ask about it in security reviews. The compliance team that's still typing first drafts in 2027 will look the same way the team using a spreadsheet looks today.

We're betting heavily that agent-first GRC is the next default. That's the thesis behind episki. If you want to see what it looks like in practice, start a free trial or book a demo. Or read more on what an AI-first GRC platform actually does under the hood.

Agent-first GRC: what changes when AI runs the program