March 25, 20264 min readProduct · Growth

What to Actually Measure in Your AI Agent Product

metricsanalyticsai agentsactivationretention

Most teams building AI agent products measure the wrong things

Most teams building AI agent products measure the wrong things. Not because they're not paying attention — because the metrics they're using were designed for a different kind of product.

DAUs. Session length. Retention D7. These are SaaS metrics. They tell you whether users came back, but not why. And in an agent product, "why" is the only question that matters.

Here's what to actually track — and what to stop wasting time on.

The metrics that don't translate

Session length is the classic trap. In a SaaS product, longer sessions usually mean higher engagement. In an agent product, a long session might mean the user found value — or it might mean the agent gave a confusing response and the user spent ten minutes trying to recover. Session length tells you nothing about quality.

Page views and click-through rates don't exist in a chat interface. There are no pages to view. There's no menu to click. Porting these metrics into your agent analytics is a sign that you're still thinking in SaaS terms.

NPS scores arrive too late and too infrequently. A quarterly NPS survey tells you what a subset of your users thought about the product in aggregate, after the fact. It can't tell you which specific interaction caused a thumbs down, or which flow drove the most drop-off. By the time you get the score, the signal is cold.

The metrics that actually matter

First-session activation rate Did the user find value in the first conversation? This is the single most predictive metric for long-term retention in agent products. Define "activation" for your product — it might be the user completing a specific flow, using a key feature, or generating an output they save or share. Then track what percentage of new users hit that moment in session one.

If your first-session activation rate is low, you have a week-one problem before you have a retention problem. Fix this first.

Feature discovery rate What percentage of users have discovered each of your agent's core capabilities? Not "used it once" — discovered it. Knew it existed. Tried it at least once.

This metric tells you where your onboarding is failing. If 80% of users have tried capability A and 12% have tried capability B, capability B has a discovery problem — not a quality problem. You're not surfacing it at the right moment.

Per-response feedback score Not NPS. Not a post-session CSAT. A per-message like/dislike rate, broken down by flow, feature, and message type.

This is the metric that tells you which specific agent responses are causing churn. A 4% thumbs-down rate on your first welcome message is a red flag. A cluster of negative feedback on responses about a specific topic tells you the model needs work there. You can't get this from session-level data — it has to be captured at the response level.

Flow completion rate For every structured flow you run — onboarding, capability intro, survey, skill unlock — what percentage of users who start it complete it? Where do they drop off?

Drop-off by step is the most actionable metric in your product analytics. If 70% of users complete step 1, 65% complete step 2, and 30% complete step 3, step 3 has a problem. Fix that step. You don't need to rethink the entire flow.

Issue report rate How frequently are users flagging problems mid-conversation? And what's the ratio of issues reported to issues resolved?

A low issue report rate doesn't necessarily mean your product is bug-free. It might mean users don't have a way to report issues — or don't bother because they don't expect it to go anywhere. A rising issue rate after a model update is an early warning signal before churn numbers move. This is one of the most underrated leading indicators in agent product analytics.

Capability adoption depth Beyond discovery: how many of your agent's core capabilities does the average user actually use regularly? An agent product where most users are only using one capability is fragile. That user has no switching cost — another product does that one thing just as well.

Depth of adoption is a proxy for stickiness. Users who regularly use three or more capabilities churn at a fraction of the rate of single-capability users.

What a healthy metrics stack looks like

A well-instrumented agent product tracks:

First-session activation rate — are new users finding value before they leave? Feature discovery rate by capability — is every capability being surfaced? Per-response feedback score — which specific responses are hurting? Flow completion rate by step — where are users dropping off in structured flows? Issue report rate and resolution time — what's breaking and how fast is it fixed? Capability adoption depth — how sticky is each user, really?

These six metrics give you a complete picture of activation, engagement, and quality — without any of the noise from session-length or pageview proxies.

The feedback loop that makes products better

The best agent teams don't just track these metrics — they close the loop. Per-response feedback goes to the team in real time via Slack. Issue reports arrive with full conversation context. Flow drop-off data triggers a review of that specific step. The product improves on a cycle of weeks, not quarters.

This is the difference between an agent product that plateaus and one that compounds. The teams measuring the right things can see what's working, fix what isn't, and ship with confidence. The teams measuring session length are optimizing for the wrong thing and wondering why retention doesn't improve.

Get started with Firstflow today and start building in-chat experiences that help AI agents activate users within minutes.

Book a demo