M13 F2.3 · Playground Crawler

Playground Crawler

When client-side crawl falls short — server-side Chromium for DOM scan, pattern detection, and screenshot review.

When you need it

Client-side crawl isn't enough

JS-heavy SPA, late mount, dynamic content, shadow DOM — that's when server-side Chromium steps in.

  • JS-heavy SPA

    React/Vue/Angular single-page apps — when the DOM is fully rendered via JS.

  • Late mount

    Initial HTML is empty; the real content mounts after 2-5s — Chromium can wait.

  • Dynamically loaded DOM

    Scroll + click + lazy load — elements that need scripted interaction to appear.

  • Shadow DOM

    Client-side access to web component shadow roots is limited; server-side Chromium can pierce them.

3 crawl modes

Single · Multi · Screenshot-only

Three modes depending on need. Multi-page pulls dynamic URL lists from CH events_canonical.

single_page

single_page

Give one URL → DOM scan + pattern detect + screenshot. Fastest mode — perfect for playground preview.

multi_page

multi_page

CH events_canonical last 7d distinct source_context['context_url'] → multi-page crawl. Looks for the same pattern across pages (total count).

screenshot_only

screenshot_only

Full-page + viewport screenshot, no DOM scan. For UX review / regression.

6-strategy selector generator

Layered selectors for auto-healing fail-safe

Each event gets 6 selector types — if one breaks, the next one kicks in (5-layer fallback healing).

Why 6 layers?

Frontend redesigns can break CSS classes, ARIA labels, or text content. Unless all 6 break at once, event tracking keeps working.

  • dataAttrdata-* attribute — most stable (manually added, dev-controlled)
  • textVisible text content — can break on i18n changes
  • cssCSS class / id — risky during class refactors
  • xpathXPath expression — dependent on DOM hierarchy
  • regexAttribute regex match — pattern-based
  • ariaARIA role + label — accessibility-grade, relatively stable
// 6 strateji, en spesifikten en gevşeğe doğru
{
  "selectors": [
    { "type": "data_attr", "value": "[data-testid='cart-add']" },
    { "type": "aria",      "value": "[role='button'][aria-label='Sepete ekle']" },
    { "type": "text",      "value": "button:has-text('Sepete ekle')" },
    { "type": "css",       "value": ".product-card .add-to-cart" },
    { "type": "xpath",     "value": "//button[contains(., 'Sepete ekle')]" },
    { "type": "regex",     "value": "button[name=~'add.cart']" }
  ]
}

Pattern detection

1,248 cart buttons → one event

Auto-groups repeated elements with the same design — one event definition covers every page.

How does it group?

Clusters by selector + neighboring DOM structure + accessible label similarity. All elements in a cluster bind to a single event.

# Multi-page crawl → 1.248 add-to-cart noktası bulundu
gurulu playground crawl \
  --mode multi_page \
  --base https://shop.example.com \
  --discover ch_events_last_7d \
  --max-pages 200

# Sonuç: pattern grouping (eşleşen selector → tek event'e bağla)
# event:add_to_cart  →  matched 1.248 instances across 47 pages

Screenshot review

MinIO presigned + thumbnail

After a crawl, each page screenshot is written to MinIO; the API returns a 302 redirect to a presigned URL.

GET /v1/playground/crawl/{crawl_id}/screenshot
  → 302 redirect → MinIO presigned URL (15 dk TTL)

# Thumbnail (JPEG, dashboard list için)
GET /v1/playground/crawl/{crawl_id}/screenshot?variant=thumb

Access control

Presigned URLs are returned with a 15-min TTL — workspace permission is checked first, then redirected to the MinIO bucket. Dashboard thumbnails are JPEG.

Performance config

Browser pool 2, timeout 30s, shm 1gb

Chromium SHM segment 1 GB — Docker default shm_size of 64 MB will not work, override is required.

Setting
Default
Note
Browser pool size
2
Concurrent Chromium instances — 2 is stable; more eats RAM.
Page timeout
30 s
Fails after 30s if DOM isn't ready. Sufficient for SPAs.
shm_size
1 GB
Docker SHM 1 GB — prevents Chromium crashes. 64 MB default will not work.
Job concurrency
1 job / worker
1 job per worker — caps Chromium memory leak risk.

Use cases

Onboarding + discovery

Two typical scenarios — new workspace setup and finding new patterns in an existing workspace.

Onboarding playground

New workspace setup: use the 'scan multi-page' button from the sessions list to detect patterns across every page and bulk-create events.

Pattern discovery

In an existing workspace post-launch — run a multi-page crawl from the playground to find new elements and add them to the event registry.

Related docs

Read next

Build audiences from patterns, see patterns surfaced in the AI summary, learn the architecture.

Playground Crawler — Gurulu Docs