M13 F2.3 · Playground Crawler

Playground Crawler

When client-side crawl falls short — server-side Chromium for DOM scan, pattern detection, and screenshot review.

When you need it

Client-side crawl isn't enough

JS-heavy SPA, late mount, dynamic content, shadow DOM — that's when server-side Chromium steps in.

JS-heavy SPA
React/Vue/Angular single-page apps — when the DOM is fully rendered via JS.
Late mount
Initial HTML is empty; the real content mounts after 2-5s — Chromium can wait.
Dynamically loaded DOM
Scroll + click + lazy load — elements that need scripted interaction to appear.
Shadow DOM
Client-side access to web component shadow roots is limited; server-side Chromium can pierce them.

3 crawl modes

Single · Multi · Screenshot-only

Three modes depending on need. Multi-page pulls dynamic URL lists from CH events_canonical.

single_page

single_page

Give one URL → DOM scan + pattern detect + screenshot. Fastest mode — perfect for playground preview.

multi_page

multi_page

CH events_canonical last 7d distinct source_context['context_url'] → multi-page crawl. Looks for the same pattern across pages (total count).

screenshot_only

screenshot_only

Full-page + viewport screenshot, no DOM scan. For UX review / regression.

6-strategy selector generator

Layered selectors for auto-healing fail-safe

Each event gets 6 selector types — if one breaks, the next one kicks in (5-layer fallback healing).

Why 6 layers?

Frontend redesigns can break CSS classes, ARIA labels, or text content. Unless all 6 break at once, event tracking keeps working.

dataAttrdata-* attribute — most stable (manually added, dev-controlled)
textVisible text content — can break on i18n changes
cssCSS class / id — risky during class refactors
xpathXPath expression — dependent on DOM hierarchy
regexAttribute regex match — pattern-based
ariaARIA role + label — accessibility-grade, relatively stable

// 6 strateji, en spesifikten en gevşeğe doğru
{
  "selectors": [
    { "type": "data_attr", "value": "[data-testid='cart-add']" },
    { "type": "aria",      "value": "[role='button'][aria-label='Sepete ekle']" },
    { "type": "text",      "value": "button:has-text('Sepete ekle')" },
    { "type": "css",       "value": ".product-card .add-to-cart" },
    { "type": "xpath",     "value": "//button[contains(., 'Sepete ekle')]" },
    { "type": "regex",     "value": "button[name=~'add.cart']" }
  ]
}

Pattern detection

1,248 cart buttons → one event

Auto-groups repeated elements with the same design — one event definition covers every page.

How does it group?

Clusters by selector + neighboring DOM structure + accessible label similarity. All elements in a cluster bind to a single event.

# Multi-page crawl → 1.248 add-to-cart noktası bulundu
gurulu playground crawl \
  --mode multi_page \
  --base https://shop.example.com \
  --discover ch_events_last_7d \
  --max-pages 200

# Sonuç: pattern grouping (eşleşen selector → tek event'e bağla)
# event:add_to_cart  →  matched 1.248 instances across 47 pages

Screenshot review

MinIO presigned + thumbnail

After a crawl, each page screenshot is written to MinIO; the API returns a 302 redirect to a presigned URL.

GET /v1/playground/crawl/{crawl_id}/screenshot
  → 302 redirect → MinIO presigned URL (15 dk TTL)

# Thumbnail (JPEG, dashboard list için)
GET /v1/playground/crawl/{crawl_id}/screenshot?variant=thumb

Access control

Presigned URLs are returned with a 15-min TTL — workspace permission is checked first, then redirected to the MinIO bucket. Dashboard thumbnails are JPEG.

Performance config

Browser pool 2, timeout 30s, shm 1gb

Chromium SHM segment 1 GB — Docker default shm_size of 64 MB will not work, override is required.

Setting

Default

Note

Browser pool size

2

Concurrent Chromium instances — 2 is stable; more eats RAM.

Page timeout

30 s

Fails after 30s if DOM isn't ready. Sufficient for SPAs.

shm_size

1 GB

Docker SHM 1 GB — prevents Chromium crashes. 64 MB default will not work.

Job concurrency

1 job / worker

1 job per worker — caps Chromium memory leak risk.

Use cases

Onboarding + discovery

Two typical scenarios — new workspace setup and finding new patterns in an existing workspace.

Onboarding playground

New workspace setup: use the 'scan multi-page' button from the sessions list to detect patterns across every page and bulk-create events.

Pattern discovery

In an existing workspace post-launch — run a multi-page crawl from the playground to find new elements and add them to the event registry.

Related docs