Here's the Best Cloudflare /crawl Settings for Any Website

Q: Should I use render true or render false with Cloudflare /crawl?

Use render false (--no-render) for server-rendered sites like Shopify, WordPress, and static sites. Use render true only for single-page apps built with React, Vue, or Angular where content loads via JavaScript. In testing, Shopify sites returned approximately 90% identical content in both modes.

Q: What is the best URL discovery source for Cloudflare /crawl?

Use --source sitemaps for sites with complete sitemaps like Shopify and WordPress. This gives predictable, complete coverage. Use --source links or all when the sitemap might be incomplete or you want to discover pages the way a search engine would.

The best Cloudflare Browser Rendering /crawl settings for Shopify stores, WordPress, static sites, SPAs, and documentation sites. Covers render modes, URL discovery, resource blocking, and cost optimization with real performance benchmarks.

Available in:

Deutsch Español Français हिन्दी Italiano Bahasa Melayu Nederlands Polski Português Türkçe Українська

Author: Tony Castillo

Published March 12, 2026

← All articles

Cloudflare /crawl endpoint configuration settings for different site types

Cloudflare’s Browser Rendering /crawl endpoint is one of the fastest ways to extract website content at scale, but the default settings are not optimal for most use cases. After running crawls across dozens of sites, from Shopify stores to React SPAs to documentation sites, these are the settings that consistently deliver the best results.

This guide covers what each setting does, when to change it, and the specific commands that work best for common site types.

The Most Important Decision: Render Mode

Every crawl starts with one choice: should Cloudflare load the page in a headless browser, or just fetch the raw HTML?

render: false (--no-render) fetches the HTML without executing JavaScript. It is fast, free during the beta period, and produces clean output for any site that serves content in the initial HTML response.

render: true (the default) loads each page in a headless Chromium instance, executes JavaScript, waits for the page to settle, then extracts the content. This is slower, consumes browser hours, and costs money after the free 10 hours/month.

When Each Mode Makes Sense

Site Type	Recommended Mode	Reason
Shopify stores	`--no-render`	Products, collections, and pages are server-rendered
WordPress sites	`--no-render`	Content is in the initial HTML response
Static sites and blogs	`--no-render`	No JavaScript-dependent content
Hugo, Jekyll, Astro sites	`--no-render`	Pre-built HTML at deploy time
React or Vue SPAs	`render: true`	Content loads via JavaScript after initial page load
Sites with lazy-loaded data	`render: true`	Reviews, pricing, and recommendations may require JS

In our testing, Shopify sites returned roughly 90% identical content between render modes. The extra content from rendering was mostly dynamic UI elements like cart drawers and recommendation widgets, not meaningful product data. We cover the full render mode comparison with head-to-head benchmarks in Pros and Cons of the Cloudflare Crawl Endpoint with Shopify Stores.

The rule of thumb: start with --no-render. If the results are missing content you need, switch to render mode.

URL Discovery: Sitemaps vs Links

The --source flag controls how Cloudflare finds pages to crawl.

--source sitemaps reads the site’s sitemap.xml and only crawls URLs listed there. This is predictable, covers the pages the site owner considers canonical, and avoids crawling duplicate or low-value pages.

--source links starts at the given URL and follows <a href> links it finds on each page. This discovers pages the way a search engine would, but can miss orphaned pages and may crawl into pagination, filters, or other low-value URL patterns.

--source all (the default) combines both methods.

Which to Use

Use --source sitemaps when the site has a complete, well-maintained sitemap. Most Shopify and WordPress sites do. This is the most reliable option for full-site content extraction.

Use --source links or all when the sitemap is missing, incomplete, or you specifically want to audit the site’s internal link structure.

Resource Blocking for Render-True Crawls

This is the single most impactful optimization for render-true crawls. By default, the headless browser loads every image, font, stylesheet, and media file on every page. This is wasteful when you only need the text content.

Add --block-resources image media font stylesheet to any render-true crawl. The effect is significant:

Speed: crawl time drops from roughly 7 seconds per page to roughly 2 seconds per page
Cost: browser hours consumed are cut by 60-70%
Reliability: pages that would hang indefinitely waiting for slow CDN assets now complete normally

The browser still executes JavaScript and builds the DOM. It just skips downloading assets that do not affect the text content.

The Wait Condition

The --wait-until flag tells the browser when to stop waiting and extract content. The default waits for all network activity to finish, which is slow and unnecessary for content extraction.

--wait-until domcontentloaded tells the browser to extract content as soon as the DOM is ready. For text extraction, this is almost always sufficient. JavaScript that loads content will have executed, but background analytics pings and ad network calls will not delay the crawl.

Recommended Commands by Site Type

Shopify Store (Full Site)

python crawl.py run https://example.com \
  --limit 500 \
  --format markdown \
  --no-render \
  --source sitemaps \
  -o results.json

Fast, free, and covers the full product catalog. Shopify sitemaps are comprehensive, so --source sitemaps gives complete coverage without crawling into paginated collections or search result pages.

Shopify Store (Products Only)

python crawl.py run https://example.com \
  --limit 1000 \
  --format markdown \
  --no-render \
  --include-patterns "https://example.com/products/**" \
  -o products.json

The --include-patterns flag restricts the crawl to URLs matching the given pattern. Useful when you only need product pages and want to skip collections, blog posts, and policy pages.

WordPress or Static Blog

python crawl.py run https://example.com \
  --limit 500 \
  --format markdown \
  --no-render \
  --source sitemaps \
  -o results.json

Same settings as Shopify. WordPress sites are server-rendered and have reliable sitemaps. Static site generators (Hugo, Jekyll, Eleventy, Astro) produce pre-built HTML, so render-false captures everything.

React or Vue SPA

python crawl.py run https://example.com \
  --limit 500 \
  --format markdown \
  --source sitemaps \
  --block-resources image media font stylesheet \
  --wait-until domcontentloaded \
  -o results.json

Render-true is the default, so no flag is needed. The critical additions are --block-resources and --wait-until domcontentloaded. Without these, the crawl will be slow and expensive.

If the SPA does not have a sitemap, switch to --source links.

Documentation Site

python crawl.py run https://docs.example.com \
  --limit 500 \
  --format markdown \
  --no-render \
  --depth 5 \
  --exclude-patterns "*/changelog/**" "*/archive/**" \
  -o docs.json

Documentation sites often have deep link structures. Increase --depth to follow nested page hierarchies. Use --exclude-patterns to skip changelog pages, archived versions, or other content you do not need.

Performance Benchmarks

These numbers come from real crawls across Shopify and e-commerce stores in March 2026. Site names have been anonymized.

Site	Pages	Mode	Content Size	Browser Time	Total Time
Supplements store (bot-protected)	89/100	no-render	5.9 MB	0s	~3.5 min
Apparel brand (large catalog)	500/500	no-render	77.1 MB	0s	~18 min
Apparel brand (large catalog)	4/5	render-true	0.6 MB	0.9s	~10s
DTC outdoor brand	256/266	no-render	11.0 MB	0s	~5 min
DTC outdoor brand	256/266	render-true	12.5 MB	1,338s	~25 min
Medical apparel store	1,200	no-render	large	0s	~55 min

Key patterns from the data:

No-render is 5 to 10 times faster than render-true for the same site
No-render consumes zero browser time (free during beta)
Wall-clock time scales linearly with page count for no-render crawls
Sites with robots.txt crawl-delay directives will be slow regardless of settings, because the crawler respects them

Cost Optimization

The Workers Paid plan is $5/month. Beyond that, costs come from browser hours consumed by render-true crawls.

Free tier: 10 browser hours/month. A 500-page render-true crawl with resource blocking uses roughly 15-20 minutes of browser time. You can run 30+ optimized crawls per month within the free tier.

Without resource blocking: the same 500-page crawl could use 60+ minutes of browser time, cutting your free capacity to roughly 10 crawls per month.

No-render crawls are free during the beta period. For server-rendered sites, there is no reason to use render-true.

Cost Formula

Browser cost = (pages x seconds_per_page) / 3600 x $0.09

At 2 seconds per page (optimized render-true): 500 pages = 0.28 hours = $0.025

At 7 seconds per page (unoptimized): 500 pages = 0.97 hours = $0.087

The difference is small per crawl but compounds when running daily or weekly crawls across multiple sites.

Limits to Know

Resource	Limit
Pages per crawl	100,000
Crawl jobs per day	Unlimited (Workers Paid)
Browser hours	10 hrs/month free, then $0.09/hr
API requests	600/minute
Concurrent browsers	30 per account
Job lifetime	7 days max, results available 14 days

Common Problems and Fixes

403 errors on most pages: the site has bot protection (Cloudflare Bot Management, Akamai, Datadome). This cannot be bypassed through the /crawl endpoint. The crawl will complete but most pages will return errors.

Render-true crawl hangs near the end: one or more pages have slow-loading resources blocking the browser. Add --block-resources image media font stylesheet and --wait-until domcontentloaded.

Missing content in no-render mode: the site loads content via JavaScript. Switch to render-true with the resource blocking and wait optimizations.

Script crashes mid-crawl: the crawl job continues running on Cloudflare’s servers. Check the status and fetch results when it finishes:

python crawl.py status <job_id>
python crawl.py results <job_id> -o out.json

Empty results from sitemaps source: the site’s sitemap may be missing or blocked. Switch to --source links or --source all.

Known Limitations

Before building a workflow around the /crawl endpoint, be aware of these constraints:

Broken relative URL resolution: Cloudflare’s markdown converter incorrectly resolves relative URLs like //www.example.com/path by prepending the page URL. This creates malformed paths in the output, particularly on Shopify sites.
Boilerplate content in every page: nav menus, mega menus, and footers appear in the markdown for every page. For a typical Shopify site, roughly 90% of the content per page is repeated template content. See our analysis of boilerplate ratios across real Shopify crawls.
No structured data extraction: JSON-LD, schema.org, and OpenGraph data are not parsed in no-render mode. Render-true captures basic OG tags in metadata but not full schema.
No 404 detection: the crawl only processes live URLs. Dead links and broken internal links are not reported.
Single starting URL: the API accepts one URL and spiders outward. It does not accept a URL list. For batch URL fetching, use the /markdown or /scrape endpoints instead.

Frequently Asked Questions

Should I use render true or render false with Cloudflare /crawl?

Use render false (--no-render) for server-rendered sites like Shopify, WordPress, and static sites. Use render true only for single-page apps built with React, Vue, or Angular where content loads via JavaScript. In testing, Shopify sites returned approximately 90% identical content in both modes.

How much does Cloudflare Browser Rendering /crawl cost?

A Workers Paid plan costs $5/month. Render-false crawls consume zero browser time and are free during beta. Render-true crawls use browser hours: 10 hours/month are free, then $0.09/hour. Blocking images, fonts, and stylesheets during render-true crawls reduces browser time significantly.

What is the best URL discovery source for Cloudflare /crawl?

Use --source sitemaps for sites with complete sitemaps like Shopify and WordPress. This gives predictable, complete coverage. Use --source links or all when the sitemap might be incomplete or you want to discover pages the way a search engine would.

Why does my Cloudflare render-true crawl hang on the last pages?

Pages with slow-loading resources like large images or third-party scripts can block the headless browser for 60+ seconds. Fix this by adding --block-resources image media font stylesheet and --wait-until domcontentloaded to your crawl command.

LLM Traffic Is a Blind Spot in Your Analytics. Here's Why.