- 
                Notifications
    
You must be signed in to change notification settings  - Fork 14
 
Thread-safety improvements. Add tests with multiple threads. CI improvements. #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           The test failures are coming from replacing  I think it's best to be conservative with this update so I'm going to revert that commit.  | 
    
          Codecov ReportAttention: Patch coverage is  
 
 Additional details and impacted files@@            Coverage Diff             @@
##           master     #122      +/-   ##
==========================================
- Coverage   79.29%   79.19%   -0.10%     
==========================================
  Files          10       10              
  Lines        1922     1923       +1     
==========================================
- Hits         1524     1523       -1     
- Misses        398      400       +2     ☔ View full report in Codecov by Sentry.  | 
    
2be833c    to
    877be4f      
    Compare
  
    | 
           As discussed offline, added back the commits replacing   | 
    
| 
           Again discussed on slack. We decided to revert to   | 
    
This should fix an exception seen in CI from the lingering timeout task:
```
 Test Summary:                                | Pass  Total  Time
Deserialization error recovery and include() |   11     11  3.9s
      From worker 4:	Unhandled Task ERROR: EOFError: read end of file
      From worker 4:	Stacktrace:
      From worker 4:	 [1] wait
      From worker 4:	   @ .\asyncevent.jl:159 [inlined]
      From worker 4:	 [2] sleep(sec::Float64)
      From worker 4:	   @ Base .\asyncevent.jl:265
      From worker 4:	 [3] (::DistributedNext.var"#34#37"{DistributedNext.Worker, Float64})()
      From worker 4:	   @ DistributedNext D:\a\DistributedNext.jl\DistributedNext.jl\src\cluster.jl:213
```
    a866f6b    to
    c09b034      
    Compare
  
    f6f1d38    to
    190a6b6      
    Compare
  
    e7212a9    to
    dcd89eb      
    Compare
  
    ddb8ad4    to
    836bed1      
    Compare
  
    3d6f794    to
    54575e2      
    Compare
  
    54575e2    to
    e28698f      
    Compare
  
    | function start_gc_msgs_task() | ||
| errormonitor( | ||
| Threads.@spawn begin | ||
| @async begin | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixed the race (#124). It seems to have always been a @spawn hence why I didn't initially convert it to @async.
e28698f    to
    ca2bd3c      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to help the process along given review is required for merge
This cuts the running time down from ~1m to ~10s. It shouldn't be necessary to re-create the workers for every iteration of the inner loop.
f189b1c    to
    8494bbd      
    Compare
  
    
Co-authored by @IanButterworth
@spawnto@asyncto get tests passing. Moving back to@spawnwill take more work, but it should be low priority because the main thread, where Distributed usually runs, will already bestickyanyway, so the gain is minor.fbf8c12(#122) is necessary to avoid each worker in those tests re-precompiling any stdlibs they use. Speeds up macos runs by ~1 minute97fc237(#122) adds a sigusr1/siginfo profile before killing a worker that won't shut down, to aid debuggingFixes #124
TODO
@async)CC @jpsamaroo