How LocusAI Works – Locus AI blog

Most session replay tools stop at recording. They capture what users do, store it, and leave you to watch the recordings one by one. LocusAI takes a different approach: we record sessions the same way everyone else does, but then we actually analyze them so you don’t have to scrub through hours of playback.

Here’s how the system works from capture to insight.

Recording with rrweb

Like most modern session replay tools, Locus uses rrweb as the foundation for recording. When a user lands on your site, rrweb serializes the DOM—the entire structure of the page, including HTML elements, attributes, and styles—into a compact format. From that point forward, it uses MutationObserver to capture every change: elements appearing or disappearing, text updates, attribute changes, the works.

User interactions get recorded as separate events. Mouse movements, clicks, scrolls, form inputs, window resizing—each one is timestamped and added to the session timeline. The result is a complete reconstruction of what the user saw and did, without any actual video being captured.

This approach keeps payload sizes small. You’re storing a set of instructions for rebuilding the page, not a screen recording. A typical session might be a few hundred kilobytes rather than tens of megabytes.

Masking sensitive data

Before any data leaves the browser, we apply masking rules to strip out sensitive information. Form fields with passwords, credit card numbers, personal details—these get redacted at capture time, not during playback. The actual values never hit our servers.

This works by configuring rrweb’s serialization to replace sensitive content with placeholder text or to skip certain elements entirely. You define what counts as sensitive (specific input types, elements with certain classes, entire sections of the page), and the recording respects those rules from the start.

The distinction matters for compliance. Masking during capture means there’s no window where sensitive data exists in your recording pipeline. It’s excluded before the data is ever transmitted.

Storing recordings in the backend

Once a session is captured, the serialized DOM snapshot and all the subsequent mutation events get sent to our backend and stored. This is the raw material—everything you’d need to play back the session exactly as it happened.

We store this data in a format optimized for retrieval and replay. When you want to watch a session, we send the initial snapshot plus the event stream to the player, which reconstructs the page and steps through the timeline. External resources like images and fonts get fetched at playback time from their original URLs (or from cached versions when available).

But here’s the thing: storing recordings is the easy part. The hard part is making sense of thousands of them without watching each one manually.

Generating optimized clickstreams and selective screenshots

Raw rrweb data is detailed, which is great for faithful replay but unwieldy for analysis. A single session might contain thousands of mutation events—most of which are irrelevant noise from a UX perspective. CSS transitions, minor DOM updates, cursor position samples at 50ms intervals. An LLM trying to process all of this would drown in tokens and cost you a fortune.

So we don’t feed it raw data. Instead, we process each session into an optimized clickstream: a compressed sequence of meaningful user actions. Page loads, clicks, form submissions, navigation events, scroll depth milestones, error states. The signal without the noise.

Alongside the clickstream, we capture selective screenshots from the DOM replay. Not every frame—just the moments that matter. State changes, before and after a click, error screens, key decision points in a flow. These screenshots give the LLM visual context without the overhead of processing continuous video frames.

The combination of structured clickstream data and targeted screenshots gives us a representation of each session that’s both compact and semantically rich.

Feeding data into the LLM for queryable summaries

With the optimized data in hand, we run sessions through an LLM to generate summaries and extract structured insights. The model reads the clickstream, looks at the screenshots, and produces a description of what happened: what the user was trying to do, where they got stuck, whether they completed their goal, what errors they encountered.

These summaries get indexed and stored alongside the raw session data. When you ask Locus a question—”how many users abandoned checkout this week?” or “what’s the most common path to the pricing page?”—you’re querying against this layer of analyzed data, not searching through raw recordings.

You can also chat with the analysis directly. Ask follow-up questions, drill into specific segments, request comparisons across time periods. The LLM acts as a collaborator that’s already watched every session and can pull quantitative answers from qualitative behavior data.

Keeping costs under control

LLM inference isn’t free, and session volumes add up fast. A site with 10,000 daily sessions can’t afford to send megabytes of raw data per session through a language model. The math doesn’t work.

This is why the optimization layer matters so much. By compressing sessions into lean clickstreams and selective screenshots, we keep token counts low enough that analyzing thousands of sessions remains practical. The LLM stays nimble because we’re not asking it to process noise.

We also batch and prioritize intelligently. Not every session needs the same depth of analysis. A session where someone landed on the homepage and bounced after three seconds doesn’t need the same treatment as a session where someone spent twenty minutes in a complex workflow and then hit an error. We allocate compute where it’ll generate the most insight.

The result is a system that scales. You can throw real production traffic at it without the analysis costs becoming prohibitive.

Try Locus

If you’re tired of watching recordings one by one, or if you’ve got more session data than you know what to do with, Locus might be worth a look. We handle the recording, the analysis, and the interface for actually getting answers out of your session data.