March 25, 20264 min readProduct · Analytics

What You Can Learn From Rating Every Session

session ratinganalyticsai agentsuser insightproduct improvementdata

Most teams understand their agent product indirectly

Most teams ship an agent product and then try to understand it through retention numbers, usage logs, and the occasional user interview. These are useful — but they're all indirect. They tell you what users did, not what they felt. They describe behavior without explaining it.

When you rate every session, something changes. You get a direct quality signal, attached to every conversation, with full context. And over time, that dataset tells you things about your product that no other source can.

Here's what it actually reveals.

Where the agent performs — and where it doesn't

Session ratings break down by use case faster than any other metric. Users asking your agent to do different things will rate their sessions differently. Some task types consistently produce high-rated sessions. Others consistently produce low ones.

This is one of the most valuable things you can learn early, and session rating surfaces it automatically. You don't need to run a user study or segment by query type manually — the pattern emerges from the rating data as soon as you have enough sessions.

What you do with it:

Double down on the task types that generate high-rated sessions — these are your agent's strongest use cases, and they should anchor your positioning and onboarding.
Investigate the low-rated task types — is the agent genuinely weak here, or is there a prompt or context issue that's fixable?
Be honest in your onboarding about what the agent does best. Users who start with a use case that produces high-rated sessions retain better than users who start with one that doesn't.

Which users are getting value and which aren't

Segment your session ratings by user cohort and the pattern becomes more specific. New users vs. returning users. Users from different channels. Users who completed the capability introduction flow vs. those who skipped it.

A few patterns that almost always emerge:

Users who completed onboarding rate sessions higher. If this is true for your product (and it usually is), it's a strong argument for investing in onboarding effectiveness. The rating data makes the ROI concrete — not "we think onboarding helps" but "users who complete onboarding rate sessions 1.4 points higher on average."

Users from certain acquisition channels rate sessions lower. This usually means there's a mismatch between what those channels promise and what the product delivers. It's an acquisition problem that shows up first in session quality, then in retention. Catching it in session ratings gives you time to fix the messaging before the retention numbers make it undeniable.

Power users rate sessions inconsistently. High-engagement users sometimes rate sessions lower than casual users — not because they're getting less value, but because their expectations are higher and they're pushing the agent into more challenging territory. These low ratings from power users are often your most valuable feedback. They're showing you the ceiling.

How quality changes over time

If you have session ratings for every conversation over a period of months, you have a quality history for your product. This is one of the most underrated analytical advantages of tracking session ratings continuously.

Before and after model updates. Did your last model update improve or degrade session quality? The retention data takes weeks to show the effect. Session ratings show it within days. Teams with session rating data can roll back a degrading update before users churn. Teams without it find out when churn spikes.

Before and after flow changes. You redesigned your capability introduction flow. Did sessions that included the new flow rate higher or lower? Session rating gives you a direct answer. You don't need a controlled experiment — you can compare the rating distributions before and after the change.

Seasonal and contextual patterns. Some products see session quality drop on certain days or during certain periods. This often reflects user context — users who are rushed, stressed, or trying to accomplish something under time pressure interact differently and rate differently. Understanding these patterns helps you design flows that match user context, not just user intent.

The early warning system for churn

The most immediately actionable thing session rating data tells you is which users are at risk right now.

A user who rates their most recent session poorly is significantly more likely to churn than a user who rates it well — even when their other behavioral signals look similar. The rating is a direct expression of dissatisfaction that often precedes the behavioral signals (shorter sessions, fewer messages, longer gaps between visits) by several days.

This gives you a window to act. A user who just rated a session poorly is still in the product, still engaged enough to give feedback. A well-timed follow-up — a reactivation flow, a direct question about what went wrong, a capability suggestion based on what they were trying to do — can recover that user.

Set up a simple rule: any user who rates a session below a threshold triggers a follow-up flow in their next session. Not an email. Not a Zendesk ticket. A conversational flow, inside the product, that acknowledges the experience wasn't great and offers something specific to improve it.

The recovery rate on proactive outreach based on session rating is substantially higher than reactive outreach based on churn. You're talking to users who are still there, not ones who've already left.

The compound effect

Session ratings are most valuable as a longitudinal dataset. A single session's rating tells you that session went well or didn't. A hundred sessions' ratings tell you where your agent consistently performs. A thousand tell you how quality trends over time, which users get the most value, and where the ceiling is.

Teams that start tracking session ratings on day one have a data asset that compounds. Every product change can be evaluated against a quality baseline. Every user segment can be understood in terms of session experience, not just behavior. Every churn event can be traced back to a session rating pattern.

The teams that don't track it are making product decisions based on behavior data and intuition. The teams that do have a direct line to what their users actually feel about every conversation their agent has.

Get started with Firstflow today and start building in-chat experiences that help AI agents activate users within minutes.

Book a demo