Skip to content

Implement specified timeout for slow doctests #39746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

user202729
Copy link
Contributor

@user202729 user202729 commented Mar 20, 2025

Fixes #39569 . Now a single doctest may add # long time (limit 100s) to set the time limit, if the actual time taken is below that then no warning will be raised.

I have added a few such comments as demonstration, but not all of them are added.

Also add some doctest and show the time taken on GitHub annotation, for convenience. (hopefully someone would look at it once the false positive/noise are dealt with…)

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

@user202729 user202729 force-pushed the long-time-extra-marker branch from c10ad41 to f7e0cc0 Compare March 20, 2025 12:18
Copy link

github-actions bot commented Mar 20, 2025

Documentation preview for this PR (built with commit ae9f5a7; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729
Copy link
Contributor Author

Note that https://doc.sagemath.org/html/en/developer/coding_basics.html#special-markup-to-influence-doctests requires # long time for tests that requires > 1 second to run, so someone would need to run --warn-long 1 and add the tag in a future pull request. Note that if it takes less than around 20 seconds then you should probably not use this feature, since the default time limit of # long time (without anything else in parentheses) is 30 seconds.

@user202729 user202729 force-pushed the long-time-extra-marker branch from ee7707a to ea574a8 Compare March 21, 2025 05:03
@user202729 user202729 marked this pull request as draft March 27, 2025 17:12
@user202729 user202729 marked this pull request as ready for review July 5, 2025 11:21
@user202729 user202729 requested a review from tobiasdiez July 5, 2025 12:06
@tobiasdiez tobiasdiez requested a review from orlitzky July 6, 2025 12:33
@orlitzky
Copy link
Contributor

orlitzky commented Jul 6, 2025

I just left a comment on #39569, but will summarize it here by saying that I think we should prefer to fix these tests rather than hide the (accurate) warnings.

According to our documentation, even long tests should complete in about 5s. If we have a test that takes 100s, it's a problem and should be dealt with. In #36226 I switched the runtime calculation to use CPU time for a more objective measure, and we lowered the warning threshold, though it is still far more lenient than the 1s and 5s recommendations in the developer guide. The whole point was to shine light upon the tests that are too slow, rather than me having to find them one at a time on my laptop when the doctests time out and create false positives. (The PR was started before we moved to Github, but now the CI has the same problem with random timeouts.)

Adding the framework for and exceptions to all of these tests will just revert us back to where we were -- with no warnings for tests that are in gross violation of our policy -- while requiring more code to do it. If there really are tests that require something like 100s to complete and there's no faster way to exercise the same code paths, then some other solution is called for (pre-release tests)?

@user202729
Copy link
Contributor Author

user202729 commented Jul 6, 2025

In theory, that's obviously the right thing to do.

In practice, suppose that it takes one year to clean up all these pull requests. Within that time, let's say we have to collectively review 1000 pull requests. The overhead of repeatedly looking at 1000 × 40 warnings can easily leads to people not looking at the warnings newly-introduced by the pull request and make the problem worse.

Adding # long time (limit Xs, issue 12345) doesn't mean it isn't an issue. It just mean that it is a known bug and we don't want to see the same thing for the upcoming pull requests.

(The alternative is you volunteer to quickly fix them, then yes, problem solved)

For comparison, there are 51 # known bug in the code base at the moment (and 5000+ issues in the repository). Obviously someone ought to fix them, but we don't want to see 51 failures × 6 platforms every pull request either.

@orlitzky
Copy link
Contributor

orlitzky commented Jul 6, 2025

For comparison, there are 51 # known bug in the code base at the moment (and 5000+ issues in the repository). Obviously someone ought to fix them, but we don't want to see 51 failures × 6 platforms every pull request either.

The problem with this analogy is that # known bug makes the test failures go away but # long time (limit Xs) does not. The tests still get run, still take forever, still cause timeouts, and still cause the test suite to fail -- a bigger problem than having to look at warnings.

40 is not a huge number. I was regularly fixing them, but it was futile before the CPU time branch was merged because if you make it possible to ignore the warnings, people ignore the warnings. Most slow tests were added because the author had a fast CPU, was testing on an unloaded system, and simply didn't realize it was slow. In those cases a smaller n, or a simpler field, or... can be used to speed up the test.

Many random tests are slow -- I was just looking at one of these in sage/geometry/cone.py that I am guilty of adding myself. The test is needed to exercise a particular code path, but the code doesn't actually need to be random. I can find a seed that happens to trigger the desired code path while terminating quickly, and then set_random_seed() before the test. I can't promise that every test will be easy to fix or that I'll understand the maths necessary to do it, but in my experience it is much much harder to find reviewers for the PRs than it is to fix the tests.

@user202729
Copy link
Contributor Author

40 is not a huge number

sure, nice.

I can't promise that every test will be easy to fix or that I'll understand the maths necessary to do it, but in my experience it is much much harder to find reviewers for the PRs than it is to fix the tests.

sounds like we have a problem... still, I suppose you can fix the test first and we see what to do later. Worst case CI fix is an option.

Most slow tests were added because the author had a fast CPU, was testing on an unloaded system, and simply didn't realize it was slow.

isn't this quite trivial i.e. the ratio of speed of any two CPU in existence at a given time is Θ(1)?

before the CPU time branch was merged

what's the relation here? or you mean the pull request also has the extra feature of raising the warning (?)

if you make it possible to ignore the warnings, people ignore the warnings

we're agreeing on this (the current situation is that there are too many warnings, which leads to people ignore them)

@tobiasdiez
Copy link
Contributor

What about simply not adding the github annotation for the known 40 too long tests and tracking those instead in a new issue? Would prevent people from adding new too long tests without annoying anyone of too many unrelated warnings.

@user202729
Copy link
Contributor Author

What about simply not adding the github annotation for the known 40 too long tests and tracking those instead in a new issue? Would prevent people from adding new too long tests without annoying anyone of too many unrelated warnings.

this is exactly the same as # long time (limit Xs), no?

@tobiasdiez
Copy link
Contributor

Not quite, since this long time with specified time is quite easy to add - and it's not clear to an average dev that it should not be used. On the other hand, a hard coded list somewhere in the doctester with a big warning header is less obvious and provides a better education.

Alternatively, we could also just tag these tests as "known bug".

@user202729
Copy link
Contributor Author

user202729 commented Jul 7, 2025

temporarily replace all tests that take too long with # known bug (see :issue:39569) (disadvantage: their correctness is then no longer tested)

@DaveWitteMorris complains about this option in #39569 .

a hard coded list somewhere in the doctester

like the baseline failure json? The disadvantage of having the list far from code is that when the code is fixed, nobody remembers to remove the entry from the list.

If it's just for educational purpose, we can either

  • add a

    • I have not introduced any more # long time (limit Xs) in this pull request

    to pull request template, or

  • change the wording ("limit Xs") to something more dangerous. (it's just a regex)

@tobiasdiez
Copy link
Contributor

a hard coded list somewhere in the doctester

like the baseline failure json? The disadvantage of having the list far from code is that when the code is fixed, nobody remembers to remove the entry from the list.

Yes, but perhaps just a hard-coded array in the doctester would be sufficient in this case. I don't think it would be that bad in this case if the entry is not immediately removed from the list.

I share the sentiment that these warnings are a bit annoying and thus people just ignore them. But I don't have a strong opinion about the best way forward.

@orlitzky
Copy link
Contributor

orlitzky commented Jul 7, 2025

Can we at least eliminate the low-hanging fruit first? If a random test is slow, # long time (limit Xs) is going to be hard to get right, because the Xs is random. We have a few examples where foo.TestSuite() is slow, but that's an obvious candidate for pytest rather than a doctest because it doesn't demonstrate anything useful.

In fact, now that I type it, moving any tests that are hard to fix into pytest would be a nice interim solution. It eliminates the warnings, sidesteps the timeout issues, and we could add comments like "if you want this back in the documentation you have to speed it up first." But it's not even clear yet which ones would be hard to fix.

@user202729
Copy link
Contributor Author

user202729 commented Aug 2, 2025

easier said than done though (looks like most of the low-hanging fruits are addressed now). Sometimes there's just a single function taking no parameter whatsoever (so you cannot reduce it), and is still slow

for example this one:

G = graphs.shortened_000_111_extended_binary_Golay_code_graph()   # 25 s

(in reality it sometimes takes a little more than 30s)

it's an example, not a test, so you can't just "move to pytest" either. cf. #40443

@orlitzky
Copy link
Contributor

orlitzky commented Aug 3, 2025

easier said than done though (looks like most of the low-hanging fruits are addressed now). Sometimes there's just a single function taking no parameter whatsoever (so you cannot reduce it), and is still slow

for example this one:

G = graphs.shortened_000_111_extended_binary_Golay_code_graph()   # 25 s

(in reality it sometimes takes a little more than 30s)

it's an example, not a test, so you can't just "move to pytest" either. cf. #40443

This function is a constant. I can construct the same graph in much less time:

sage: %timeit -c H = Graph([G.vertices(), G.edges()], format="vertices_and_edges")
1.02 s ± 1.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

(and can similarly speed up most functions that take no arguments). We can pickle the vertices/edges as python ints, lists, and tuples -- and then load them if the user wants to construct this graph. For a test (suitable for pytest) we could then verify that the algorithm produces a graph isomorphic to the pickled one. This leaves the doctest example untouched, but running 25x faster.

vbraun pushed a commit to vbraun/sage that referenced this pull request Aug 12, 2025
sagemathgh-40558: Add long time marker to several slow tests
    
This gets rid of about half of the warnings, until someone figure out
whether they're intended to be slow, or how they can be sped up.

Reference: sagemath#39569,
sagemath#39746

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40558
Reported by: user202729
Reviewer(s): Michael Orlitzky, user202729
vbraun pushed a commit to vbraun/sage that referenced this pull request Aug 13, 2025
sagemathgh-40558: Add long time marker to several slow tests
    
This gets rid of about half of the warnings, until someone figure out
whether they're intended to be slow, or how they can be sped up.

Reference: sagemath#39569,
sagemath#39746

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40558
Reported by: user202729
Reviewer(s): Michael Orlitzky, user202729
vbraun pushed a commit to vbraun/sage that referenced this pull request Aug 14, 2025
sagemathgh-40558: Add long time marker to several slow tests
    
This gets rid of about half of the warnings, until someone figure out
whether they're intended to be slow, or how they can be sped up.

Reference: sagemath#39569,
sagemath#39746

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40558
Reported by: user202729
Reviewer(s): Michael Orlitzky, user202729
vbraun pushed a commit to vbraun/sage that referenced this pull request Aug 16, 2025
sagemathgh-40558: Add long time marker to several slow tests
    
This gets rid of about half of the warnings, until someone figure out
whether they're intended to be slow, or how they can be sped up.

Reference: sagemath#39569,
sagemath#39746

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#40558
Reported by: user202729
Reviewer(s): Michael Orlitzky, user202729
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What to do with warning: slow doctest … Test ran for [very long time]?
3 participants