Skip to content

Conversation

@YaSuenag
Copy link
Contributor

@YaSuenag YaSuenag commented Feb 6, 2025

I run following ApplicationRunner Spring Boot app and I obtained checkpoint by CRIU. The app did not finish after restoring.

  @Override
  public void run(ApplicationArguments args) throws Exception {
    if(args.containsOption("checkpoint")){
      System.out.println("Ready to obtain checkpoint...");
      // Wait restoring...
      cpCoordinator.await();
    }
    System.out.println("from Spring Boot App");
  }

I obtained thread dump, then I got following stack trace. It shows beforeCheckpoint CRaC handler waits signal in CyclicBarrier.

"prevent-shutdown" #29 [1504] prio=5 os_prio=0 cpu=0.17ms elapsed=25.76s tid=0x00007feb1017db00 nid=1504 waiting on condition  [0x00007feb4e22b000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000008a9279b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:371)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block([email protected]/AbstractQueuedSynchronizer.java:519)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3780)
        at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3725)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:1707)
        at java.util.concurrent.CyclicBarrier.dowait([email protected]/CyclicBarrier.java:236)
        at java.util.concurrent.CyclicBarrier.await([email protected]/CyclicBarrier.java:364)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter.awaitPreventShutdownBarrier(DefaultLifecycleProcessor.java:634)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter.lambda$beforeCheckpoint$0(DefaultLifecycleProcessor.java:606)
        at org.springframework.context.support.DefaultLifecycleProcessor$CracResourceAdapter$$Lambda/0x00007feb501c37c0.run(Unknown Source)
        at java.lang.Thread.runWith([email protected]/Thread.java:1596)
        at java.lang.Thread.run([email protected]/Thread.java:1583)

I investigated CracResourceAdapter, prevent-shutdown thread might through the second awaitPreventShutdownBarrier() call if that thread runs before awaitPreventShutdownBarrier() at beforeCheckpoint().

We need to separate barriers for beforeCheckpoint / afterRestore to work as expected.

Signed-off-by: Yasumasa Suenaga <[email protected]>
@YaSuenag YaSuenag force-pushed the pr/crac-restore-hang branch from 5136e9e to 13fbbd1 Compare February 6, 2025 03:19
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Feb 6, 2025
@sdeleuze sdeleuze self-assigned this Feb 6, 2025
@sdeleuze sdeleuze added in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement labels Feb 6, 2025
@snicoll snicoll removed the status: waiting-for-triage An issue we've not yet triaged or decided on label Feb 7, 2025
@snicoll snicoll added this to the 6.2.x milestone Feb 7, 2025
@sdeleuze sdeleuze modified the milestones: 6.2.x, 6.2.6 Mar 21, 2025
@sdeleuze
Copy link
Contributor

Could you please attach or share a link to the repository of your reproducer?

@sdeleuze sdeleuze added the status: waiting-for-feedback We need additional information before we can continue label Mar 21, 2025
@YaSuenag
Copy link
Contributor Author

YaSuenag commented Mar 22, 2025

Could you please attach or share a link to the repository of your reproducer?

Reproducer is here: https://github.com/YaSuenag/checkpointer/tree/main/example/springboot-cli

This is an example of checkpointer, library to implement CRaC event hooks and coordinate with CRIU events. This is not CRaC so event handling might be faster than CRaC, and it might surface race conditions which couldn't be see on CRaC.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 22, 2025
@sdeleuze sdeleuze removed the status: feedback-provided Feedback has been provided label Apr 4, 2025
@sdeleuze sdeleuze modified the milestones: 6.2.6, 7.0.x Apr 4, 2025
@YaSuenag
Copy link
Contributor Author

YaSuenag commented Jun 5, 2025

PING: do you have any comments? I can fix if needs, and I'm happy if I can contribute this.

@sdeleuze sdeleuze modified the milestones: 7.0.x, 7.0.0-RC3 Oct 29, 2025
@sdeleuze
Copy link
Contributor

Sorry for the delay, I missed the related notification and was deep on other higher priority topics.
The proposed changed looks ok and I have been able to test successfully the regular CRaC support with it, so let's use the opportunity of the upcoming 7.0.0-RC3 to ship it. I prefer not targeting 6.x for such refinement difficult to test extensively, but you won't have long to wait since the GA is expected in November.

sdeleuze pushed a commit to sdeleuze/spring-framework that referenced this pull request Oct 29, 2025
sdeleuze added a commit to sdeleuze/spring-framework that referenced this pull request Oct 29, 2025
sdeleuze pushed a commit to sdeleuze/spring-framework that referenced this pull request Oct 29, 2025
@sdeleuze sdeleuze closed this in 0bcff38 Oct 29, 2025
@sdeleuze
Copy link
Contributor

@YaSuenag Merged, thanks for your contribution.

@sdeleuze sdeleuze changed the title [CRaC] Fix hangup after restoring Fix CRaC potential hangup after restoring Oct 29, 2025
@YaSuenag YaSuenag deleted the pr/crac-restore-hang branch October 29, 2025 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants