How Does Session Replay Work?

If you’ve ever watched a session recording in Hotjar, Clarity, FullStory, or any other analytics tool, you might have wondered what’s actually happening under the hood. The recording looks like a video of someone using your website, but it’s not a video at all. There’s no screen capture, no video compression, no massive files being uploaded from your users’ browsers.

So what is it?

It’s a DOM reconstruction, not a video

Session replay tools don’t record video. Instead, they capture the structure of your webpage—the DOM (Document Object Model)—and all the changes that happen to it over time. When you hit “play” on a recording, you’re watching a reconstruction of what the page looked like, rebuilt from that structural data.

Think of it like the difference between filming someone building with Legos versus writing down each piece they placed and where. The second approach takes up far less space, and you can recreate the build perfectly from the instructions.

Most session replay tools work by taking an initial snapshot of the entire DOM when a user lands on a page. This snapshot includes all the HTML elements, their attributes, and a copy of the CSS that styles them. From that point forward, the tool records mutations—every time something changes in the DOM, that change gets logged as an event with a timestamp.

Mouse movements, clicks, scrolls, form inputs, elements appearing or disappearing—all of these become discrete events in a timeline. Play them back in sequence against the initial DOM snapshot, and you get something that looks exactly like the user’s session.

Most tools use rrweb (or something like it)

The open-source library that powers most of this is called rrweb (record and replay web). If you’ve used session replay in the past few years, there’s a good chance rrweb or a derivative of it was involved. Clarity uses it directly. Many other tools have either built on top of it or implemented similar approaches.

rrweb works by serializing the DOM into a custom format, then using MutationObserver to capture all subsequent changes. The serialization process assigns unique IDs to every node in the DOM tree, which lets the replay engine know exactly which elements changed and how.

One thing rrweb doesn’t do by default is store images. The initial snapshot captures the structure of the page and references to image URLs, but the actual image data isn’t included in the recording. When you watch a replay, the player fetches those images from their original URLs (or from cached/proxied versions, depending on the implementation). This keeps recording sizes small but means that if an image URL goes dead between recording and replay, you’ll see a broken image in the playback.

The same goes for fonts, external stylesheets loaded from CDNs, and other external resources. The recording is a set of instructions for rebuilding the page, not a complete archive of everything on it.

How individual tools differ (and don’t)

When you compare Clarity, Hotjar, FullStory, LogRocket, Heap, and the rest, the actual recording mechanism is remarkably similar across the board. They’re all doing some version of DOM serialization plus mutation tracking. The differences tend to show up elsewhere.

Microsoft Clarity is free and uses rrweb directly. It’s straightforward to set up and gives you heatmaps alongside recordings. The trade-off is that you’re somewhat limited in how you can search, filter, and analyze recordings at scale. It works well for smaller sites or teams just getting started with session replay.

Hotjar also offers a free tier and targets a similar audience. The recording approach is essentially the same. Hotjar has put more investment into surveys and feedback tools that complement the replay functionality. Like Clarity, it’s not really built for engineering teams doing deep debugging.

FullStory and LogRocket are positioned more toward product and engineering teams. The recording technology is still DOM-based, but these tools have built more sophisticated search and filtering capabilities on top. You can search for sessions where a specific error occurred, where a user rage-clicked, or where a particular API call failed. LogRocket in particular integrates with error monitoring to show you console logs and network requests alongside the visual replay.

Heap takes a slightly different angle by auto-capturing events and tying replay to their broader analytics platform. The underlying recording mechanism isn’t radically different, but the value proposition is more about connecting session behavior to aggregate metrics.

The point is: from a technical standpoint, the recording itself is largely commoditized. Everyone is capturing DOM state and mutations. Everyone is reconstructing rather than recording video. The real differences are in what happens after the data is collected—how you find the sessions that matter, what you can learn from them, and how that fits into your workflow.

What this means in practice

Understanding the DOM-based approach helps explain some of the quirks you’ll encounter with session replay tools.

Recordings can break if your CSS changes between when a session was recorded and when you watch it. The replay pulls current stylesheets (or proxied snapshots, depending on the tool), and if those don’t match what was live during the session, things can look wrong.

Dynamic content loaded via JavaScript will show up in recordings as long as the mutations were captured, but content that loads in iframes often won’t unless the replay tool specifically handles cross-origin iframe recording.

Sensitive data masking works by modifying what gets serialized in the first place. Tools redact form fields, text content, or entire elements before the data ever leaves the browser. This is why masking configuration matters—it’s not applied during playback, it’s applied during capture.

And the size of your recordings depends heavily on how dynamic your pages are. A mostly static marketing page generates tiny recordings. A complex single-page app with constant state changes generates much larger ones, because every mutation is an event that needs to be stored.

The differentiation is in the analysis

If you’re evaluating session replay tools, don’t spend too much time worrying about the recording mechanism itself. The technology is well understood and implementations have converged. Spend your time thinking about what happens downstream.

How will you find the sessions worth watching? Can you filter by user frustration signals? Can you connect sessions to errors, support tickets, or conversion funnels? Can you share specific moments with teammates? Does the tool integrate with the rest of your stack?

The recording is table stakes. What you do with it is where tools actually compete.

Using Locus

This is the problem we’re working on with Locus. Rather than building another recording mechanism, we’re focused on what happens after capture. Locus uses LLMs to analyze session recordings automatically, so you’re not stuck scrubbing through hours of playback hoping to spot the moment something went wrong. The system watches the sessions and surfaces what’s actually worth your attention.

Beyond that, you can talk to the analysis directly. Ask questions like “how many users hit this error state?” or “what did people do right before abandoning checkout?”—and get actual answers, not just a list of sessions to review. It’s closer to working with a data analyst who’s already watched everything than it is to using a search filter. You collaborate with the model to pull quantitative insights out of qualitative session data, without needing to build dashboards or write queries first.

The recording tech is solved. Knowing what to do with thousands of recordings is the harder problem.