Skip to content

Commit f660110

Browse files
committed
Add guidance on blocking crawlers with robots.txt and fix broken FAQ anchor links
- Add new 'Blocking crawlers with robots.txt' section to robots-txt.mdx with three examples: block all bots, block only the /crawl endpoint, and block specific paths - Add cross-reference from crawl-endpoint.mdx to the new blocking guidance - Fix broken anchor link in automatic-request-headers.mdx (#how-do-i-allowlist-browser-rendering → #can-i-allowlist-browser-rendering-on-my-own-website) - Fix broken anchor link in robots-txt.mdx Related resources (#will-browser-rendering-bypass-cloudflares-bot-protection → #will-browser-rendering-be-detected-by-bot-management) - Add Automatic request headers to Related resources in robots-txt.mdx
1 parent 9daeab0 commit f660110

File tree

3 files changed

+45
-3
lines changed

3 files changed

+45
-3
lines changed

src/content/docs/browser-rendering/reference/automatic-request-headers.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,4 @@ The `Signature` headers use an authentication method called [Web Bot Auth](/bots
4343

4444
### Bot detection
4545

46-
The bot detection ID for Browser Rendering is `128292352`. If you are attempting to scan your own zone and want Browser Rendering to access your website freely without your bot protection configuration interfering, you can create a WAF skip rule to [allowlist Browser Rendering](/browser-rendering/faq/#how-do-i-allowlist-browser-rendering).
46+
The bot detection ID for Browser Rendering is `128292352`. If you are attempting to scan your own zone and want Browser Rendering to access your website freely without your bot protection configuration interfering, you can create a WAF skip rule to [allowlist Browser Rendering](/browser-rendering/faq/#can-i-allowlist-browser-rendering-on-my-own-website).

src/content/docs/browser-rendering/reference/robots-txt.mdx

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,47 @@ Sitemap: https://example.com/sitemap.xml
5555

5656
The value is in seconds. A `crawl-delay` of 2 means the crawler waits two seconds between requests.
5757

58+
## Blocking crawlers with robots.txt
59+
60+
If you want to prevent Browser Rendering (or other crawlers) from accessing your site, you can configure your `robots.txt` to restrict access.
61+
62+
### Block all bots from your entire site
63+
64+
To prevent all crawlers from accessing any page on your site:
65+
66+
```txt title="robots.txt"
67+
User-agent: *
68+
Disallow: /
69+
```
70+
71+
This is the most restrictive configuration and blocks all compliant bots, not just Browser Rendering.
72+
73+
### Block only the /crawl endpoint
74+
75+
The [`/crawl` endpoint](/browser-rendering/rest-api/crawl-endpoint/) identifies itself with the User-Agent `CloudflareBrowserRenderingCrawler/1.0`. To block the `/crawl` endpoint while allowing all other traffic (including other Browser Rendering [REST API](/browser-rendering/rest-api/) endpoints, which use a [different User-Agent](/browser-rendering/reference/automatic-request-headers/#user-agent)):
76+
77+
```txt title="robots.txt"
78+
User-agent: CloudflareBrowserRenderingCrawler
79+
Disallow: /
80+
81+
User-agent: *
82+
Allow: /
83+
```
84+
85+
### Block specific paths
86+
87+
To allow crawling of your site but block specific sections:
88+
89+
```txt title="robots.txt"
90+
User-agent: CloudflareBrowserRenderingCrawler
91+
Disallow: /admin/
92+
Disallow: /private/
93+
Allow: /
94+
95+
User-agent: *
96+
Allow: /
97+
```
98+
5899
## Best practices for sitemaps
59100

60101
Structure your sitemap to help crawlers process your site efficiently:
@@ -113,4 +154,5 @@ Browser Rendering periodically refetches sitemaps to keep content fresh. Serve y
113154

114155
## Related resources
115156

116-
- [FAQ: Will Browser Rendering bypass Cloudflare's Bot Protection?](/browser-rendering/faq/#will-browser-rendering-bypass-cloudflares-bot-protection) — Instructions for creating a WAF skip rule
157+
- [FAQ: Will Browser Rendering be detected by Bot Management?](/browser-rendering/faq/#will-browser-rendering-be-detected-by-bot-management) — How Browser Rendering interacts with bot protection and how to create a WAF skip rule
158+
- [Automatic request headers](/browser-rendering/reference/automatic-request-headers/) — User-Agent strings and non-configurable headers used by Browser Rendering

src/content/docs/browser-rendering/rest-api/crawl-endpoint.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -425,7 +425,7 @@ Use the `source` parameter to customize which sources the crawler uses. The avai
425425

426426
### robots.txt and bot protection
427427

428-
The `/crawl` endpoint respects the directives of `robots.txt` files, including `crawl-delay`. All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`. For guidance on configuring `robots.txt` and sitemaps for sites you plan to crawl, refer to [robots.txt and sitemaps](/browser-rendering/reference/robots-txt/).
428+
The `/crawl` endpoint respects the directives of `robots.txt` files, including `crawl-delay`. All URLs that `/crawl` is directed not to crawl are listed in the response with `"status": "disallowed"`. For guidance on configuring `robots.txt` and sitemaps for sites you plan to crawl, refer to [robots.txt and sitemaps](/browser-rendering/reference/robots-txt/). If you want to block the `/crawl` endpoint from accessing your site, refer to [Blocking crawlers with robots.txt](/browser-rendering/reference/robots-txt/#blocking-crawlers-with-robotstxt).
429429

430430
:::caution[Bot protection may block crawling]
431431
Browser Rendering does not bypass CAPTCHAs, Turnstile challenges, or any other bot protection mechanisms. If a target site uses Cloudflare products that control or restrict bot traffic such as [Bot Management](/bots/), [Web Application Firewall (WAF)](/waf/), or [Turnstile](/turnstile/), the same rules will apply to the Browser Rendering crawler.

0 commit comments

Comments
 (0)