Here's the Best Cloudflare /crawl Settings for Any Website
Cloudflare’s Browser Rendering /crawl endpoint is one of the fastest ways to extract website content at scale, but the default settings are not optimal for most use cases. After running crawls across dozens of sites, from Shopify stores to React SPAs to documentation sites, these are the settings that consistently deliver the best results.
This guide covers what each setting does, when to change it, and the specific commands that work best for common site types.
The Most Important Decision: Render Mode
Every crawl starts with one choice: should Cloudflare load the page in a headless browser, or just fetch the raw HTML?
render: false (--no-render) fetches the HTML without executing JavaScript. It is fast, free during the beta period, and produces clean output for any site that serves content in the initial HTML response.
render: true (the default) loads each page in a headless Chromium instance, executes JavaScript, waits for the page to settle, then extracts the content. This is slower, consumes browser hours, and costs money after the free 10 hours/month.
When Each Mode Makes Sense
| Site Type | Recommended Mode | Reason |
|---|---|---|
| Shopify stores | --no-render |
Products, collections, and pages are server-rendered |
| WordPress sites | --no-render |
Content is in the initial HTML response |
| Static sites and blogs | --no-render |
No JavaScript-dependent content |
| Hugo, Jekyll, Astro sites | --no-render |
Pre-built HTML at deploy time |
| React or Vue SPAs | render: true |
Content loads via JavaScript after initial page load |
| Sites with lazy-loaded data | render: true |
Reviews, pricing, and recommendations may require JS |
In our testing, Shopify sites returned roughly 90% identical content between render modes. The extra content from rendering was mostly dynamic UI elements like cart drawers and recommendation widgets, not meaningful product data. We cover the full render mode comparison with head-to-head benchmarks in Pros and Cons of the Cloudflare Crawl Endpoint with Shopify Stores.
The rule of thumb: start with --no-render. If the results are missing content you need, switch to render mode.
URL Discovery: Sitemaps vs Links
The --source flag controls how Cloudflare finds pages to crawl.
--source sitemaps reads the site’s sitemap.xml and only crawls URLs listed there. This is predictable, covers the pages the site owner considers canonical, and avoids crawling duplicate or low-value pages.
--source links starts at the given URL and follows <a href> links it finds on each page. This discovers pages the way a search engine would, but can miss orphaned pages and may crawl into pagination, filters, or other low-value URL patterns.
--source all (the default) combines both methods.
Which to Use
Use --source sitemaps when the site has a complete, well-maintained sitemap. Most Shopify and WordPress sites do. This is the most reliable option for full-site content extraction.
Use --source links or all when the sitemap is missing, incomplete, or you specifically want to audit the site’s internal link structure.
Resource Blocking for Render-True Crawls
This is the single most impactful optimization for render-true crawls. By default, the headless browser loads every image, font, stylesheet, and media file on every page. This is wasteful when you only need the text content.
Add --block-resources image media font stylesheet to any render-true crawl. The effect is significant:
- Speed: crawl time drops from roughly 7 seconds per page to roughly 2 seconds per page
- Cost: browser hours consumed are cut by 60-70%
- Reliability: pages that would hang indefinitely waiting for slow CDN assets now complete normally
The browser still executes JavaScript and builds the DOM. It just skips downloading assets that do not affect the text content.
The Wait Condition
The --wait-until flag tells the browser when to stop waiting and extract content. The default waits for all network activity to finish, which is slow and unnecessary for content extraction.
--wait-until domcontentloaded tells the browser to extract content as soon as the DOM is ready. For text extraction, this is almost always sufficient. JavaScript that loads content will have executed, but background analytics pings and ad network calls will not delay the crawl.
Recommended Commands by Site Type
Shopify Store (Full Site)
python crawl.py run https://example.com \
--limit 500 \
--format markdown \
--no-render \
--source sitemaps \
-o results.json
Fast, free, and covers the full product catalog. Shopify sitemaps are comprehensive, so --source sitemaps gives complete coverage without crawling into paginated collections or search result pages.
Shopify Store (Products Only)
python crawl.py run https://example.com \
--limit 1000 \
--format markdown \
--no-render \
--include-patterns "https://example.com/products/**" \
-o products.json
The --include-patterns flag restricts the crawl to URLs matching the given pattern. Useful when you only need product pages and want to skip collections, blog posts, and policy pages.
WordPress or Static Blog
python crawl.py run https://example.com \
--limit 500 \
--format markdown \
--no-render \
--source sitemaps \
-o results.json
Same settings as Shopify. WordPress sites are server-rendered and have reliable sitemaps. Static site generators (Hugo, Jekyll, Eleventy, Astro) produce pre-built HTML, so render-false captures everything.
React or Vue SPA
python crawl.py run https://example.com \
--limit 500 \
--format markdown \
--source sitemaps \
--block-resources image media font stylesheet \
--wait-until domcontentloaded \
-o results.json
Render-true is the default, so no flag is needed. The critical additions are --block-resources and --wait-until domcontentloaded. Without these, the crawl will be slow and expensive.
If the SPA does not have a sitemap, switch to --source links.
Documentation Site
python crawl.py run https://docs.example.com \
--limit 500 \
--format markdown \
--no-render \
--depth 5 \
--exclude-patterns "*/changelog/**" "*/archive/**" \
-o docs.json
Documentation sites often have deep link structures. Increase --depth to follow nested page hierarchies. Use --exclude-patterns to skip changelog pages, archived versions, or other content you do not need.
Performance Benchmarks
These numbers come from real crawls across Shopify and e-commerce stores in March 2026. Site names have been anonymized.
| Site | Pages | Mode | Content Size | Browser Time | Total Time |
|---|---|---|---|---|---|
| Supplements store (bot-protected) | 89/100 | no-render | 5.9 MB | 0s | ~3.5 min |
| Apparel brand (large catalog) | 500/500 | no-render | 77.1 MB | 0s | ~18 min |
| Apparel brand (large catalog) | 4/5 | render-true | 0.6 MB | 0.9s | ~10s |
| DTC outdoor brand | 256/266 | no-render | 11.0 MB | 0s | ~5 min |
| DTC outdoor brand | 256/266 | render-true | 12.5 MB | 1,338s | ~25 min |
| Medical apparel store | 1,200 | no-render | large | 0s | ~55 min |
Key patterns from the data:
- No-render is 5 to 10 times faster than render-true for the same site
- No-render consumes zero browser time (free during beta)
- Wall-clock time scales linearly with page count for no-render crawls
- Sites with
robots.txtcrawl-delay directives will be slow regardless of settings, because the crawler respects them
Cost Optimization
The Workers Paid plan is $5/month. Beyond that, costs come from browser hours consumed by render-true crawls.
Free tier: 10 browser hours/month. A 500-page render-true crawl with resource blocking uses roughly 15-20 minutes of browser time. You can run 30+ optimized crawls per month within the free tier.
Without resource blocking: the same 500-page crawl could use 60+ minutes of browser time, cutting your free capacity to roughly 10 crawls per month.
No-render crawls are free during the beta period. For server-rendered sites, there is no reason to use render-true.
Cost Formula
Browser cost = (pages x seconds_per_page) / 3600 x $0.09
At 2 seconds per page (optimized render-true): 500 pages = 0.28 hours = $0.025
At 7 seconds per page (unoptimized): 500 pages = 0.97 hours = $0.087
The difference is small per crawl but compounds when running daily or weekly crawls across multiple sites.
Limits to Know
| Resource | Limit |
|---|---|
| Pages per crawl | 100,000 |
| Crawl jobs per day | Unlimited (Workers Paid) |
| Browser hours | 10 hrs/month free, then $0.09/hr |
| API requests | 600/minute |
| Concurrent browsers | 30 per account |
| Job lifetime | 7 days max, results available 14 days |
Common Problems and Fixes
403 errors on most pages: the site has bot protection (Cloudflare Bot Management, Akamai, Datadome). This cannot be bypassed through the /crawl endpoint. The crawl will complete but most pages will return errors.
Render-true crawl hangs near the end: one or more pages have slow-loading resources blocking the browser. Add --block-resources image media font stylesheet and --wait-until domcontentloaded.
Missing content in no-render mode: the site loads content via JavaScript. Switch to render-true with the resource blocking and wait optimizations.
Script crashes mid-crawl: the crawl job continues running on Cloudflare’s servers. Check the status and fetch results when it finishes:
python crawl.py status <job_id>
python crawl.py results <job_id> -o out.json
Empty results from sitemaps source: the site’s sitemap may be missing or blocked. Switch to --source links or --source all.
Known Limitations
Before building a workflow around the /crawl endpoint, be aware of these constraints:
- Broken relative URL resolution: Cloudflare’s markdown converter incorrectly resolves relative URLs like
//www.example.com/pathby prepending the page URL. This creates malformed paths in the output, particularly on Shopify sites. - Boilerplate content in every page: nav menus, mega menus, and footers appear in the markdown for every page. For a typical Shopify site, roughly 90% of the content per page is repeated template content. See our analysis of boilerplate ratios across real Shopify crawls.
- No structured data extraction: JSON-LD, schema.org, and OpenGraph data are not parsed in no-render mode. Render-true captures basic OG tags in metadata but not full schema.
- No 404 detection: the crawl only processes live URLs. Dead links and broken internal links are not reported.
- Single starting URL: the API accepts one URL and spiders outward. It does not accept a URL list. For batch URL fetching, use the
/markdownor/scrapeendpoints instead.
Frequently Asked Questions
Should I use render true or render false with Cloudflare /crawl?
Use render false (--no-render) for server-rendered sites like Shopify, WordPress, and static sites. Use render true only for single-page apps built with React, Vue, or Angular where content loads via JavaScript. In testing, Shopify sites returned approximately 90% identical content in both modes.
How much does Cloudflare Browser Rendering /crawl cost?
A Workers Paid plan costs $5/month. Render-false crawls consume zero browser time and are free during beta. Render-true crawls use browser hours: 10 hours/month are free, then $0.09/hour. Blocking images, fonts, and stylesheets during render-true crawls reduces browser time significantly.
What is the best URL discovery source for Cloudflare /crawl?
Use --source sitemaps for sites with complete sitemaps like Shopify and WordPress. This gives predictable, complete coverage. Use --source links or all when the sitemap might be incomplete or you want to discover pages the way a search engine would.
Why does my Cloudflare render-true crawl hang on the last pages?
Pages with slow-loading resources like large images or third-party scripts can block the headless browser for 60+ seconds. Fix this by adding --block-resources image media font stylesheet and --wait-until domcontentloaded to your crawl command.