feat(config): Add FCrDNS checker #682

Axelen123 · 2025-06-17T15:20:43Z

Closes #431. This PR implements dynamic verification of bot IPs using DNS records. For details regarding how it works, see the documentation I have added. I am not sure if the way I added a new algorithm is the best way to implement this. Let me know if there is a better way.

Added a description of the changes to the [Unreleased] section of docs/docs/CHANGELOG.md
Added test cases to the relevant parts of the codebase
Ran integration tests npm run test:integration (unsupported on Windows, please use WSL)

Xe · 2025-06-17T15:33:17Z

Hey, thanks for the contribution!

The CHALLENGE rule is more meant for client-facing challenges. You probably want something like the checker.Impl interface here. This will let you add an ALLOW rule for AppleBot et.al.

lib/policy/checker/checker.go

Axelen123 · 2025-06-25T12:55:20Z

I have finished converting the implementation to a checker.

Xe

Approved modulo the change to checker.List#Check

Axelen123 · 2025-06-26T19:26:40Z

I have added CEL bindings and reverted to the previous checker behavior.

Signed-off-by: Xe Iaso <[email protected]>

If a client claims to be Googlebot but isn't from Google, that's kinda suspicious and should be treated as such. Signed-off-by: Xe Iaso <[email protected]>

Signed-off-by: Xe Iaso <[email protected]>

Xe · 2025-06-27T18:16:05Z

Thanks much! This is gonna let us do a lot of fun things :)

Signed-off-by: Xe Iaso <[email protected]>

Ahrefs is a large SEO company used by single bloggers to large enterprises. It may be beneficial to allow (or deny) them in Anubis. They do publish rDNS entries, so once an Anubis version with TecharoHQ#682 is released, this policy would benefit from setting up that check. Further information: https://ahrefs.com/robot

Ahrefs is a large SEO company used by single bloggers to large enterprises. It may be beneficial to allow (or deny) them in Anubis. They do publish rDNS entries, so once an Anubis version with TecharoHQ#682 is released, this policy would benefit from setting up that check. Crawler information: https://ahrefs.com/robot Majestic is a UK based specialist search engine and commercial SEO entity. They claim to "spider the Web for the purpose of building a search engine" with a distributed crawler. Defaults to allow as it'd be caught with the generic browser policy definition. Crawler information: https://mj12bot.com Screaming Frog is a smaller actor in the SEO space and their crawler occasionally attempts to access content despite being explicitly excluded via robots.txt directives. As far as I could research they neither publish their IP ranges nor provide an information page for their crawler. That's why this defaults to deny. Company website: https://www.screamingfrog.co.uk Checkmark Network is a brand and intellectual property protection company. If you have no direct business with them, it is likely they shouldn't be crawling your content in the first place. Defaults to deny for this reason. Crawler information: https://www.checkmarknetwork.com/spider.html/ Domainsbot collects information on domains and website data for intellectual property disputes. Unless you have direct business with them, there's likely no reason for them to be accessing your content. Defaults to deny. Crawler information: https://domainsbot.com/pandalytics/ zoominfo is a data mining and sales platform for enterprise use, feeding the gathered information into a machine learning model. It is unlikely to be of value to anyone else. Therefore, this defaults to deny. Company website: https://www.zoominfo.com

Axelen123 added 4 commits June 16, 2025 17:12

implement fcrdns challenge

cf59306

Merge remote-tracking branch 'origin/main'

76c85a5

use algorithm instead

6d71ea2

add documentation

1d3ed05

Axelen123 changed the title ~~Fcrdns~~ feat(lib/challenge): FCrDNS challenge method Jun 17, 2025

Merge remote-tracking branch 'origin/main' into fcrdns

1b0a36b

implement as a checker instead

c6869b3

Axelen123 changed the title ~~feat(lib/challenge): FCrDNS challenge method~~ feat(config): Add FCrDNS checker Jun 22, 2025

Merge remote-tracking branch 'origin/main' into fcrdns

4cc4c33

Axelen123 force-pushed the fcrdns branch from 580cb3a to 4cc4c33 Compare June 22, 2025 19:50

cleanup

17cfb24

Axelen123 commented Jun 22, 2025

View reviewed changes

lib/policy/checker/checker.go Outdated Show resolved Hide resolved

Axelen123 added 2 commits June 25, 2025 14:37

Merge remote-tracking branch 'origin/main' into fcrdns

ede9476

update documentation

5fbb70f

Xe approved these changes Jun 25, 2025

View reviewed changes

Axelen123 marked this pull request as draft June 25, 2025 21:17

Axelen123 added 4 commits June 26, 2025 21:16

add fcrdns function to cel

fee4bd9

revert multiple checker behavior

c44a030

fully revert multi checker behavior

8018a5b

Merge remote-tracking branch 'origin/main' into fcrdns

98c69c9

Axelen123 marked this pull request as ready for review June 26, 2025 19:24

Axelen123 and others added 5 commits June 26, 2025 21:27

fix formatting

5feae37

chore(lib/policy): run go tool goimports

be7900b

Signed-off-by: Xe Iaso <[email protected]>

docs(CHANGELOG): update changelog notes

c726381

Signed-off-by: Xe Iaso <[email protected]>

feat(data/crawlers): add weight to clients that lie

c5bb34b

If a client claims to be Googlebot but isn't from Google, that's kinda suspicious and should be treated as such. Signed-off-by: Xe Iaso <[email protected]>

Merge branch 'main' into fcrdns

d1254e0

Signed-off-by: Xe Iaso <[email protected]>

Xe approved these changes Jun 27, 2025

View reviewed changes

Xe enabled auto-merge (squash) June 27, 2025 18:15

chore: spelling

01b8b87

Signed-off-by: Xe Iaso <[email protected]>

Xe disabled auto-merge June 27, 2025 18:20

Xe enabled auto-merge (squash) June 27, 2025 18:23

chore(data/crawlers/googlebot): fix typo

dfac612

Signed-off-by: Xe Iaso <[email protected]>

Xe force-pushed the fcrdns branch from f001428 to dfac612 Compare June 27, 2025 18:23

Xe added 2 commits June 27, 2025 14:31

chore: fix spelling errors

64989c4

Signed-off-by: Xe Iaso <[email protected]>

tests(lib): inject fcrdns into the test contexts

ff978d2

Signed-off-by: Xe Iaso <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(config): Add FCrDNS checker #682

feat(config): Add FCrDNS checker #682

Uh oh!

Axelen123 commented Jun 17, 2025 •

edited

Loading

Uh oh!

Xe commented Jun 17, 2025

Uh oh!

Uh oh!

Axelen123 commented Jun 25, 2025

Uh oh!

Xe left a comment

Uh oh!

Axelen123 commented Jun 26, 2025

Uh oh!

Xe commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

feat(config): Add FCrDNS checker #682

Are you sure you want to change the base?

feat(config): Add FCrDNS checker #682

Uh oh!

Conversation

Axelen123 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xe commented Jun 17, 2025

Uh oh!

Uh oh!

Axelen123 commented Jun 25, 2025

Uh oh!

Xe left a comment

Choose a reason for hiding this comment

Uh oh!

Axelen123 commented Jun 26, 2025

Uh oh!

Xe commented Jun 27, 2025

Uh oh!

Uh oh!

Axelen123 commented Jun 17, 2025 •

edited

Loading