Skip to content

Fix race condition in FontFile data access during threaded imports#107224

Open
d6e wants to merge 3 commits intogodotengine:masterfrom
d6e:fix-font-import-race-condition
Open

Fix race condition in FontFile data access during threaded imports#107224
d6e wants to merge 3 commits intogodotengine:masterfrom
d6e:fix-font-import-race-condition

Conversation

@d6e
Copy link

@d6e d6e commented Jun 6, 2025

Problems

  1. FontFile Data Race

    • When importing fonts on multiple threads, two threads could grab the same FontFile from ResourceCache.
    • One thread calls set_data(), the other calls get_data() at the same time.
    • Copy-on-write kicks in and invalidates the pointer the second thread was using, leading to a crash.
  2. Resource Change Callbacks on Background Threads

    • Import threads were firing resource-change callbacks directly.

    • That caused node operations off the main thread, triggering errors like:

      ERROR: Caller thread can't call this function in this node (/root). Use call_deferred() or call_thread_group() instead.
        at: propagate_notification (scene/main/node.cpp:2523)
      

Fixes

1. FontFile Thread Safety

  • Files updated: scene/resources/font.h and scene/resources/font.cpp

  • What changed:

    • Added a mutable Mutex data_mutex inside FontFile.
    • Wrapped set_data_ptr(), set_data(), and get_data() in data_mutex locks.
    • In set_data(), forced a unique COW copy (data.write()[0] = data[0]) so no other thread is still holding an old pointer.
    • Added null checks in get_data() so it won’t try to read invalid data.

2. Resource Callback Thread Safety

  • File updated: core/io/resource_loader.cpp

  • What changed:

    • Swapped out rcc.callable.call() for rcc.callable.call_deferred() in resource_changed_emit().
    • Now callbacks run on the main thread, avoiding those “can’t call this in a background thread” errors.

Use call_deferred() instead of direct call() in ResourceLoader::resource_changed_emit() to ensure resource change callbacks execute on the main thread. This prevents segfaults when font imports trigger node operations from background threads.
@d6e d6e requested review from a team as code owners June 6, 2025 17:18
@d6e d6e force-pushed the fix-font-import-race-condition branch from 047e113 to 50211c8 Compare June 6, 2025 17:23
@d6e
Copy link
Author

d6e commented Jun 6, 2025

Had to force push an update because I forgot to run the formatter

@d6e d6e requested a review from a team as a code owner June 6, 2025 18:13
@d6e d6e force-pushed the fix-font-import-race-condition branch from 6bb7cc3 to 50211c8 Compare June 6, 2025 18:23
@d6e
Copy link
Author

d6e commented Jun 6, 2025

Added bounds checking but then decided against it because if the array sizes mismatch that would indicate a deeper issue

@clayjohn
Copy link
Member

clayjohn commented Jun 6, 2025

Do you happen to have an MRP so that PR reviewers can easily test your code and confirm that it works?

@clayjohn
Copy link
Member

clayjohn commented Jun 6, 2025

Also should we close #107218 now since this PR fully supersedes it?

@d6e
Copy link
Author

d6e commented Jun 6, 2025

Do you happen to have an MRP so that PR reviewers can easily test your code and confirm that it works?

I don't have one, it's an issue that occurs in our very large project and only happens 1 out of approximately 20 times. Do I need one to continue with this?

@clayjohn
Copy link
Member

clayjohn commented Jun 6, 2025

Do I need one to continue with this?

Not necessarily, it just significantly increases the speed that this will be merged. Without an MRP it might be challenging for a reviewer to verify that the code is correct and actually works.

That being said, I am unfamiliar with this code, another reviewer may be able to read the code and determine that it is obviously correct

@Calinou Calinou added bug topic:gui cherrypick:4.4 Considered for cherry-picking into a future 4.4.x release topic:import labels Jun 6, 2025
@Calinou Calinou added this to the 4.5 milestone Jun 6, 2025
@d6e
Copy link
Author

d6e commented Jun 6, 2025

Okay, I think this MRP should do it. I can confirm it crashes without the fix and does not crash with the fix

FontRaceConditionMRP.zip

@bruvzg bruvzg self-requested a review June 7, 2025 12:56
Comment on lines +2093 to +2094
// Force a unique copy to prevent COW issues
data.write[0] = data[0];
Copy link
Member

@bruvzg bruvzg Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly opposite of what is wanted in the absolute majority of cases, the whole point of having PackedByteArray is ensuring the same font data is never copied.

What probably should be done instead is getting red of raw pointer at all and only keep PackedByteArray in both Font and TextServer (this will require adding a way to make PackedByteArray from static memory buffer without make a copy).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/scene/resources/font.cpp b/scene/resources/font.cpp
index 357d22c3e4..63ecd42418 100644
--- a/scene/resources/font.cpp
+++ b/scene/resources/font.cpp
@@ -592,6 +592,8 @@ _FORCE_INLINE_ void FontFile::_ensure_rid(int p_cache_index, int p_make_linked_f
                if (p_make_linked_from >= 0 && p_make_linked_from != p_cache_index && p_make_linked_from < cache.size()) {
                        cache.write[p_cache_index] = TS->create_font_linked_variation(cache[p_make_linked_from]);
                } else {
+                       MutexLock lock(data_mutex);
+
                        cache.write[p_cache_index] = TS->create_font();
                        TS->font_set_data_ptr(cache[p_cache_index], data_ptr, data_size);
                        TS->font_set_antialiasing(cache[p_cache_index], antialiasing);
@@ -1410,9 +1412,10 @@ void FontFile::_get_property_list(List<PropertyInfo> *p_list) const {

 void FontFile::reset_state() {
        _clear_cache();
-       data.clear();
+       data = PackedByteArray();
        data_ptr = nullptr;
        data_size = 0;
+       data_external = false;
        cache.clear();

        antialiasing = TextServer::FONT_ANTIALIASING_GRAY;
@@ -2075,9 +2078,10 @@ Error FontFile::load_dynamic_font(const String &p_path) {
 void FontFile::set_data_ptr(const uint8_t *p_data, size_t p_size) {
        MutexLock lock(data_mutex);

-       data.clear();
+       data = PackedByteArray();
        data_ptr = p_data;
        data_size = p_size;
+       data_external = true;

        for (int i = 0; i < cache.size(); i++) {
                if (cache[i].is_valid()) {
@@ -2091,16 +2095,9 @@ void FontFile::set_data(const PackedByteArray &p_data) {

        // Make a copy to ensure data stability before getting the pointer
        data = p_data;
-       // Ensure data is not empty before getting pointer
-       if (data.size() > 0) {
-               // Force a unique copy to prevent COW issues
-               data.write[0] = data[0];
-               data_ptr = data.ptr();
-               data_size = data.size();
-       } else {
-               data_ptr = nullptr;
-               data_size = 0;
-       }
+       data_ptr = data.ptr();
+       data_size = data.size();
+       data_external = false;

        for (int i = 0; i < cache.size(); i++) {
                if (cache[i].is_valid()) {
@@ -2112,11 +2109,9 @@ void FontFile::set_data(const PackedByteArray &p_data) {
 PackedByteArray FontFile::get_data() const {
        MutexLock lock(data_mutex);

-       if (unlikely((size_t)data.size() != data_size)) {
+       if (unlikely(data_external && data.is_empty() && data_ptr && data_size > 0)) {
                data.resize(data_size);
-               if (data_ptr && data_size > 0) {
-                       memcpy(data.ptrw(), data_ptr, data_size);
-               }
+               memcpy(data.ptrw(), data_ptr, data_size);
        }
        return data;
 }
diff --git a/scene/resources/font.h b/scene/resources/font.h
index 3dd6b77aa4..95ecf2b78d 100644
--- a/scene/resources/font.h
+++ b/scene/resources/font.h
@@ -189,6 +189,7 @@ class FontFile : public Font {
        mutable Mutex data_mutex;
        const uint8_t *data_ptr = nullptr;
        size_t data_size = 0;
+       bool data_external = false;
        mutable PackedByteArray data;

        TextServer::FontAntialiasing antialiasing = TextServer::FONT_ANTIALIASING_GRAY;

This seems to work as well, and should avoid unneeded COW copies. Changing data in get_data should never happen for loaded fonts, it's only there to support get_data on built-in fonts.

It's also reasonable to use the same mutex when TS font objects are created in _ensure_rid, since the same pointers are accessed.

@Repiteo Repiteo added the cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release label Sep 18, 2025
@Repiteo Repiteo modified the milestones: 4.5, 4.x Sep 18, 2025
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on Windows 11 24H2 with the MRP linked above, it works as expected.

Note that I had to increase the memory/limits/message_queue/max_size_mb project setting in the MRP to get it to run correctly (I set it to 2048 just to be sure).

=== FontFile Race Condition Reproduction ===
This MRP uses real threading to reproduce the FontFile race condition
that causes crashes during concurrent font data access.

Created FontFile with test data, size: 2048
Starting threaded race condition test...
(This should crash in versions without the FontFile fix)

Stopping threads...

=== Results ===
UNEXPECTED: No crashes detected after 2275500 iterations
This suggests the FontFile race condition fix is already present,
or the race condition wasn't triggered this time.

Expected behavior:
- WITHOUT fix: Segfaults/crashes during execution
- WITH fix: Completes without crashes

However, in master, running the MRP didn't crash either, both with unoptimized and optimized editor builds compiled using MSVC 2022. I'm on a 9950X3D, so maybe I would need to adjust OS.delay_msec() calls to get the race condition to occur (assuming it was tested on a slower CPU).

@Calinou
Copy link
Member

Calinou commented Dec 10, 2025

@d6e Which OS and compiler are you using to reproduce the issue (and test the fix)?

@bruvzg
Copy link
Member

bruvzg commented Dec 11, 2025

Note that I had to increase the memory/limits/message_queue/max_size_mb project setting in the MRP to get it to run correctly (I set it to 2048 just to be sure).

I can't get it working with or without fix, it's just getting stuck and endlessly consuming memory (killed it after going over 40 GB), it's not related to the FontFile since behavior is the same if **_safe_set_data/_safe_get_data body is replaced with pass.

Tested on macOS 26, for the reference.

@d6e d6e closed this Jan 12, 2026
@d6e d6e reopened this Jan 12, 2026
@d6e
Copy link
Author

d6e commented Jan 12, 2026

Thanks for the suggested implementation! I've applied your data_external approach which avoids the unnecessary COW copies.

I also added mutex protection to reset_state() since it modifies the same fields (data_ptr, data_size, data_external) and ResourceCache::get_ref() releases its lock before returning.

I don't really have the time to get into the MRP right now, do you want to go with your suggested implementation?

@d6e
Copy link
Author

d6e commented Jan 12, 2026

@d6e Which OS and compiler are you using to reproduce the issue (and test the fix)?

Linux and gcc 13.3.0

Add mutex protection to FontFile data access methods to prevent
race conditions when multiple import threads access the same
cached FontFile simultaneously.

Use data_external flag to track whether data came from set_data_ptr()
vs set_data() to avoid unnecessary PackedByteArray COW copies.
@d6e d6e force-pushed the fix-font-import-race-condition branch from b351451 to adba698 Compare January 12, 2026 07:11
Co-authored-by: A Thousand Ships <96648715+AThousandShips@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug cherrypick:4.4 Considered for cherry-picking into a future 4.4.x release cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release topic:gui topic:import

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants