Does not actually block bot traffic (tested with curl) #809
-
I am currently deploying Anubis in front of our project's test logs. With a real browser I see the Anubis proof-of-work page, and then it goes But what worries me is that it does not actually seem to block automated traffic? When I run I could probably have made an error in our configuration -- I use a fairly standard one: - name: anubis
image: ghcr.io/techarohq/anubis:latest
ports:
- containerPort: 8080
protocol: TCP
name: anubis-port
env:
# https://anubis.techaro.lol/docs/admin/installation/
- name: BIND
value: ":8080"
- name: METRICS_BIND
value: ":9099"
- name: SERVE_ROBOTS_TXT
value: "true"
- name: TARGET
# app container listens on port 10080
value: "http://localhost:10080/"
- name: DIFFICULTY
value: "6"
- name: COOKIE_EXPIRATION_TIME
value: "24h" But this is equally true for other sites that have Anubis, e.g.:
Opening them with Firefox does the Proof-of-work, but running What am I missing? Can Anubis be configured to not allow this? Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@martinpitt curl isn't actually a misbehaved / deceptive user agent on its own, and it definitely isn't an AI training bot by default. it is, however, a helpful debugging tool/library used by humans and infrastructure software alike. You might still want to do a bucked rate-limit for curl users, but not really much more aggressively than any other user agent. TL;DR: Curl is friend shaped. |
Beta Was this translation helpful? Give feedback.
-
Hi, welcome to the land of tradeoffs. Anubis is intended to have two "modes of operation": the default config which attempts to break as little as possible in the process of being added in and the less than default config where administrators have customized it for their needs. Curl and the like are allowed through in that first mode so package managers, monitoring tools, and as much sysadmin tooling as possible doesn't break. If you want to show challenges across the board, you need to have a base weight rule like this: bots:
- name: base-weight
action: WEIGH
expression: "true"
weight:
adjust: 5
# other rules go here Then you need to account for all the user agents of all the software that should be allowed to use the service that isn't a browser, such as the git client, curl, wget, etc. This is a pain and I have yet to complete a set of rules / establish guidance on how to do this. Most of the time administrators don't have a complete list of everything that should be allowed to communicate with a web service without that tool being a browser. The other reason vanilla curl is allowed out of the box is because Anubis is targeted at the patterns that abusive scrapers do. They don't just use I am collecting some data from honeypots to try and get better heuristics, but the main lesson I've learned working on Anubis is that shitty heuristics buy you time. The core of how Anubis works is an exceptionally shitty heuristic. This has backfired a little, but solving that is a much smaller problem space than what Anubis solves as a whole. |
Beta Was this translation helpful? Give feedback.
Hi, welcome to the land of tradeoffs.
Anubis is intended to have two "modes of operation": the default config which attempts to break as little as possible in the process of being added in and the less than default config where administrators have customized it for their needs. Curl and the like are allowed through in that first mode so package managers, monitoring tools, and as much sysadmin tooling as possible doesn't break.
If you want to show challenges across the board, you need to have a base weight rule like this:
Then you need to account for all the user agents of all the …