Verto

System Demonstration

Production workflow: From Scanned Document upload to Native Google Docs reconstruction.

01. The Problem

The global translation industry has a hidden technical debt: Layout Entropy.

Standard OCR tools (AWS Textract, Tesseract) treat documents as unstructured "bags of words." While they extract text efficiently, they fail to preserve layout topology—breaking tables, merging headers, and displacing stamps.

Sworn translators save 60 minutes on translation but lose 45 minutes per document manually reconstructing formatting in Microsoft Word.

The Solution: I architected Verto, a proprietary reconstruction engine. It does not just "read" text; it rebuilds the document's DOM with high structural fidelity in native Google Docs.

Verto Review Editor Interface showing side-by-side comparison — The Verto Editor: A side-by-side verification environment ensuring near-perfect layout fidelity on supported document classes.

02. Architecture

The system is designed as an event-driven, stateful reconstruction pipeline. It prioritizes Durable Execution over simple request/response cycles.

A. The Auditor

Generating JSON is easy; guaranteeing legal accuracy is hard. I implemented an Adversarial Verification Loop (Agent 2) that challenges the output of the reconstruction engine (Agent 1).

Agent 1 (Builder): Generates the document DOM based on visual input.
Agent 2 (Auditor): Performs geometric and visual consistency checks against the source image.
The Loop: If the Auditor detects a missing stamp or misaligned table border, it rejects the batch and forces a regeneration with higher attention weights on the failed region.

Result: A self-correcting pipeline that prioritizes accuracy over speed, crucial for the "Sworn Translation" market.

B. The Google Docs Bridge

The Google Docs API allows for programmatic document creation but requires strictly ordered batchUpdate requests. It does not accept HTML or raw text dumps.

I engineered a custom Intermediate Representation (IR) mapper (GoogleDocsMapper) that translates the AI's ProseMirror AST into specific Google Docs operations.

Core Challenge: Virtual Grid Calculation Google Docs tables cannot be pasted; they must be constructed cell-by-cell. To handle complex legal tables with merged cells (rowspan/colspan), the system must "simulate" the table in memory before writing it.

// src/lib/google-docs/mapper/virtual-grid.ts
// Complexity: O(n*m) where n=rows, m=cols.

// Core Logic: Resolving the Virtual Grid for Table Reconstruction
// This ensures high structural fidelity for complex legal tables.

for (let r = 0; r < node.content.length; r++) {
    for (const cell of row.content) {
        // Check collision with previous rowspan/colspan
        while (grid[r] && grid[r][c]) {
            c++;
        }
        // Register span for later API merge requests
        if (colspan > 1 || rowspan > 1) {
            spanMap.set(`${r},${c}`, { row: r, col: c, rowspan, colspan });
        }
    }
}

Executive Metrics: Aggregated volume, revenue, and efficiency stats.

Ingestion Wizard: Auto-detection of source language and layout types. — Executive Metrics: Aggregated volume, revenue, and efficiency stats.

C. Processing Pipeline

Processing 50+ page discovery files exceeds standard context windows. I implemented a Rolling Context State Machine that passes a BatchContext object between execution steps, enforcing semantic consistency across the entire file lifecycle.

The compute layer runs on Trigger.dev to achieve Durable Execution. If a job fails on Page 45, the system resumes from the last committed checkpoint rather than restarting.

Job Inspector: Detailed metadata, token usage, and quality scores (95/100).

Job Visualization: Step-by-step progress from AI Vision to Google Docs Reconstruction (95% Confidence). — Job Inspector: Detailed metadata, token usage, and quality scores (95/100).

03. Infrastructure

The Double-Entry Ledger

To handle the "Per-Page" credit system, I rejected simple database increments. I implemented a Double-Entry Ledger in PostgreSQL.

ACID Compliance: Every credit consumption is a transaction row, not a field update.
Race Condition Proof: This architecture guarantees financial accuracy even when multiple heavy documents finish processing simultaneously.

Client Portal: Usage tracking and consumption history.

Ledger Filters: Granular audit tools for transaction types (Purchase vs Consumption). — Client Portal: Usage tracking and consumption history.

04. Capability

Verto is fully functional and architected to solve the "Layout Entropy" problem for real-world legal documents. It is currently available as a white-label engine for translation agencies.

Metric	Standard OCR	Verto Engine
Table Accuracy	< 60% (Breaks on merges)	High Fidelity (Virtual Grid Verified)
Formatting Time	45 Minutes / Doc	< 2 Minutes (Review Only)
Job Completion Rate	~85% (Timeouts on large files)	99.9% (Durable Execution)
Output	Static PDF	Native, Editable Google Doc

05. Tech Stack

This platform required a deep integration of Event-Driven Architecture and Modern Frontend Constraints.

Domain	Core Stack	Key Libraries & Patterns
Frontend	Next.js 16	React Server Components, Server Actions, Edge Runtime.
UI System	Tailwind v4	`shadcn/ui` (Radix Primitives), `framer-motion` (Layout Constraints), `sonner` (Toast System).
Editor	Tiptap	Headless ProseMirror implementation with Custom Node Views (React NodeView).
State	TanStack Query	Optimistic UI updates synchronized via Supabase Realtime (WebSockets).
Compute	Trigger.dev	Long-running background jobs (24h max duration) on dedicated containers.
Validation	Zod	Strict schema validation for AI Structured Outputs (`openai.responses`).
Data Viz	Recharts	Interactive credit consumption and burn-down analytics.
PDF Engine	React-PDF	Virtualized PDF rendering with `react-window` for large legal files.
Payments	Stripe	Webhook-driven Ledger synchronization with Idempotency keys.

06. Thesis

Verto proves that solving "Layout Entropy" requires treating documents as Stateful Systems, not static assets.

The value isn't in the AI model (which is a commodity); it is in the Bridge that makes that model compatible with the rigid constraints of the real world (Google Docs, Legal Compliance, Auditors).

This project wasn't just about building a translation tool; it was about engineering a Visual Compiler that turns raw pixels into structured, legally binding data.