Handling rate limiting for external services #7189
Replies: 3 comments
-
|
cc @emi420 @dakotabenjamin @prabinoid @suzit-10 @kshitijrajsharma @omranlm @petya-kangalova |
Beta Was this translation helpful? Give feedback.
-
|
Related task: #7196 |
Beta Was this translation helpful? Give feedback.
-
|
Proposals As I understand the stats, there are two sources? 1 the TM db and 2 the external OSM/OhSome db. It was the external db response that was problematic? So, how important are the stats that they need to be updated every time a task is submitted and a user looks at their contribution stats? Would getting the external stats every 15 minutes cover most users as that is around the suggested ideal time to map a task? Approximately how many times is the external service called in 24 hours? So proposal 1 would be get external stats every 15 minutes, but TM stats could be more frequent. Although I think every 15 minutes for both sources would seem adequate for me. 2 TTL - Time To Live? If collating stats every 15 minutes, would this cover TTL? 3 No comment 4 15 minute intervals overall might help here? Try several (5?) times every 15 minutes, but if 5 failures, then wait to next interval. Open questions 1 New user stats message something more like 'Statistics will start to appear after you have saved your work to OSM and completed tasks.' 2 I can think of no reason why coordinating with OSM Ops would not be a good idea. I would hope that any API could respond with the throttle or ban response rather than have to involve a person. I guess it's a person for now. If we call every 15 minutes or when authenticating user, I think we'd not be much load on OSM. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Recent 429 Too Many Requests errors are causing login blockages across the ecosystem. This has been observed not only in Tasking Manager but also in uMap, HOT Export Tool, and fAIr.
Symptom: The OSM stats endpoint/auth redirect is failing or timing out.
Impact: Because the stats fetch is a blocking part of the login flow in TM, our users cannot work even if the rest of our infrastructure is healthy.
Our login flow treats non-essential metadata (user stats) as a hard dependency for authentication. When OSM-wide rate limits are triggered, this blocks our workflow.
Proposal:
retry-afterinformatiin in 429 response headers and then make a call after the timeout expires?Other open questions:
Beta Was this translation helpful? Give feedback.
All reactions