Polish through user feedback, not feature lists

Testing FlowDoc on real websites immediately exposed the gap between "it works" and "it's actually useful." The raw event capture was solid. Every click, keystroke, and navigation got recorded with pixel-perfect screenshots. But the output read like a server log, not documentation a human would want to read.

The mantus.ai test run captured 13 events that became 13 separate workflow steps. Click "Learn" was step 3. Navigate to the Guides page was step 4. What should have been "Open the Guides section" became two disconnected actions. Worse, DOM event bubbling meant clicking a button inside a card also triggered a click on the card itself. Duplicate steps for the same user action.

The problem wasn't bugs. It was that raw browser events don't map to human mental models.

Focus on the output first

Before adding any features, we built a post-processing pipeline to clean the captured data. Four passes transformed the event log into readable documentation:

Pass 1: Remove duplicate clicks. When two click events fired within 500ms, we kept the one with shorter text content. The actual target element, not its container.

Pass 2: Merge clicks with navigation. If a click led to a page change within 2 seconds, combine them into one step: "Navigate to Guides" instead of separate "Click Learn" and "Navigation to /guides" entries.

Pass 3: Generate readable titles. Transform click button[data-target="guides"] into "Open Guides section" based on the element's text content and context.

Pass 4: Clean indexing. Renumber everything sequentially after merging and deduplication.

This pipeline turned 14 raw events into 12 meaningful workflow steps. The difference in readability was dramatic. From technical noise to something you could actually hand to a colleague.

Test on real enterprise software

Testing on a real enterprise web application revealed two more issues the public websites had hidden:

Silent navigation changes. Modern SPAs often change URLs without triggering browser navigation events. Clicking "Explore groups" changed the URL from /start.html to /groups.html, but no popstate or pushState event fired. The post-processor only merged explicit navigation events, so these transitions stayed invisible.

We fixed this by checking if the next captured step had a different URL pathname. If so, annotate the current click with the navigation result without consuming the next step.

Login flows need special handling. The captured workflow started at the login screen by default. That's rarely what you want to document. Most flows assume the user is already logged in.

We made "wait for Enter" the default behavior. Every capture session now starts with a free browsing phase where you can log in, navigate to your starting point, then press Enter when you're ready to begin recording the actual workflow.

Let the data shape your data model

The most important insight came from watching the post-processing logic evolve. Initially, we thought in terms of a linear sequence. Step 1, step 2, step 3. But real workflows branch. "From the dashboard, you can either create a new project or join an existing team."

Rather than force branching into a linear model, we rebuilt around a graph structure where each step became a node with potential multiple edges to other nodes. This let us represent Y-forks naturally while keeping linear flows as the simple case. Just nodes with one outgoing edge each.

The payoff came later when building visual exports. A graph model maps directly to flowchart tools and collaborative boards. The linear sequence becomes just one way to traverse the same underlying data.

Polish through real usage, not assumptions

Each test session revealed something we couldn't have planned for upfront:

Password fields needed masking. The first login capture showed clear-text passwords in the documentation. Fixed by detecting input[type="password"] and showing [masked] instead of the actual keystrokes.
File extensions in page names looked technical. "Login.jsp" should just be "Login" in the step titles. Small detail, big impact on readability.
Screenshots needed context. Full-page captures were often too zoomed out to see the actual click target. Added viewport highlighting to show exactly where the interaction happened.

The pattern was consistent: build the minimum viable version, test it on real workflows, fix the highest-friction issue, repeat. This revealed needs much faster than trying to anticipate everything upfront.