Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion doc/classes/Performance.xml
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,36 @@
<constant name="NAVIGATION_3D_OBSTACLE_COUNT" value="58" enum="Monitor">
Number of active navigation obstacles in the [NavigationServer3D].
</constant>
<constant name="MONITOR_MAX" value="59" enum="Monitor">
<constant name="FRAME_PACING_TOTAL_TIME" value="59" enum="Monitor">
Value used by [constant RenderingServer.CPU_GPU_SYNC_AUTO] to determine whether we should be in PARALLEL or in SEQUENTIAL mode. It is the sum of CPU Time + GPU Time. If the value is consistently high enough, CPU_GPU_SYNC_AUTO will determine to use PARALLEL, otherwise it will prefer SEQUENTIAL.
[b]Note:[/b] this value attempts to be bereft of any additional time caused from waiting for V-Sync, therefore it will not match any other timing value (e.g. actual FPS, time taken by physics, etc). It is an estimation of how long the system would take if CPU and GPU were to be processing a frame serially, without the added delay of waiting for V-Sync.
[b]Note:[/b] When using these monitors, it's best to set the Editor to a simple view like the Script tab to avoid the 2D/3D view from consuming system resources that could interfere with readings. Or better yet, run the Editor profiler in another machine.
[b]Note:[/b] Mode must be in [constant RenderingServer.CPU_GPU_SYNC_AUTO] to work.
</constant>
<constant name="FRAME_PACING_CPU_TIME" value="60" enum="Monitor">
How long CPU took to process the frame, bereft of waiting delays caused by V-Sync. This value is an approximation and might not match any other timing value. If this value is added to GPU Time, you get Total Time. Useful to know where to focus optimization efforts.
[b]Note:[/b] Mode must be in [constant RenderingServer.CPU_GPU_SYNC_AUTO] to work.
</constant>
<constant name="FRAME_PACING_GPU_TIME" value="61" enum="Monitor">
How long GPU took to process the frame, bereft of waiting delays caused by V-Sync. This value is an approximation and will not match any other timing value. If this value is added to CPU Time, you get Total Time. Useful to know where to focus optimization efforts.
[b]Note:[/b] Mode must be in [constant RenderingServer.CPU_GPU_SYNC_AUTO] to work.
</constant>
<constant name="FRAME_PACING_EVALUATED_SYNC_MODE" value="62" enum="Monitor">
The mode decided by [constant RenderingServer.CPU_GPU_SYNC_AUTO] that we should be in for each frame based on Total Time. "1" means we should be in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], "2" means we should be in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL].
[b]Note:[/b] This value is not the actual mode Godot is in, because the decision is averaged over time to prevent Godot from constantly switching back and forth between PARALLEL and SEQUENTIAL (which would cause visible stutters). Ideally this should be a perfect flat line of either 1s or 2s. If you see the game going back and forth between 1 and 2, then the system is not fast enough for a smooth low-latency experience; or the game should be optimized further until it is.
</constant>
<constant name="FRAME_PACING_ACTUAL_SYNC_MODE" value="63" enum="Monitor">
The [b]actual[/b] mode the game currently is. "1" means we are in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], "2" means we are in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL].
[b]Note:[/b] This value should be as flat as possible. Every time it switches between "1" and "2", the game may suffer a small stutter.
</constant>
<constant name="FRAME_PACING_MISSED_HARD_TARGET" value="64" enum="Monitor">
The number of frames where the "Total Time" has exceeded the monitor's refresh rate or the max FPS (whichever is lower). This does not necessarily mean the game has missed a V-Blank (if the game is running in [constant RenderingServer.CPU_GPU_SYNC_PARALLEL], then total frame time should be lower than the sum of CPU Time + GPU Time; thus in practice the app may not have missed any V-Blank) but it indicates V-Blanks would've been missed if executing in [constant RenderingServer.CPU_GPU_SYNC_SEQUENTIAL]. The value is expressed in thousands.
For example one missed Hard Target will be shown as 1000. Two missed Hard Targets will be shown as 2000. This value decreases quickly over time. Missed Hard Targets weight heavily on [constant RenderingServer.CPU_GPU_SYNC_AUTO] deciding to switch to PARALLEL to avoid degrading the experience further.
[b]Note:[/b] While in PARALLEL mode, this counter is always reset to 0 each new frame, thus while [constant FRAME_PACING_ACTUAL_SYNC_MODE] is 1, this value will be either 0 or 1000, where a flat 1000 line means the game is always failing to reach the target framerate.
[b]Note:[/b] Spikes in missed hard targets almost always means very visible stutter and thus should be avoided at all costs during gameplay. This value should be kept at 0 at all times. If the system isn't fast enough to keep the target framerate, this value should always be 1000 to keep pacing consistent.
[b]Note:[/b] Mode must be in [constant RenderingServer.CPU_GPU_SYNC_AUTO] to work.
</constant>
<constant name="MONITOR_MAX" value="65" enum="Monitor">
Represents the size of the [enum Monitor] enum.
</constant>
</constants>
Expand Down
8 changes: 8 additions & 0 deletions doc/classes/ProjectSettings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3191,6 +3191,14 @@
[b]Note:[/b] This property's upper limit is controlled by [member rendering/rendering_device/staging_buffer/block_size_kb] and whether it's possible to allocate a single block of texture data with this region size in the format that is requested.
[b]Note:[/b] This property is only read when the project starts. There is currently no way to change this value at run-time.
</member>
<member name="rendering/rendering_device/vsync/cpu_gpu_sync" type="int" setter="" getter="" default="0">
Sets the CPU/GPU synchronization mode used by the renderer:
- [b]Parallel[/b] tells the renderer to prioritize higher framerate by allowing the CPU to queue up additional frames before they're rendered by the GPU. This allows the CPU and GPU to work in tandem, improving the framerate and framepacing in complex scenes at the expense of input latency. This default setting is suitable for most 3D applications, especially on mobile and lower-performance desktop hardware.
- [b]Auto[/b] tells the renderer to decide whether to use Parallel or use Sequential. See [constant RenderingServer.CPU_GPU_SYNC_AUTO].
[b]Note:[/b] This property may be overridden with the [code]--cpu-gpu-sync[/code] command-line argument. When this argument is used, this project setting is ignored.
[b]Note:[/b] This property is only read when the project starts. To change the CPU/GPU synchronization mode at runtime, call [method RenderingServer.set_cpu_gpu_sync_mode] instead.
See also [member rendering/rendering_device/vsync/frame_queue_size] and [member rendering/rendering_device/vsync/swapchain_image_count].
</member>
<member name="rendering/rendering_device/vsync/frame_queue_size" type="int" setter="" getter="" default="2">
The number of frames to track on the CPU side before stalling to wait for the GPU.
Try the [url=https://darksylinc.github.io/vsync_simulator/]V-Sync Simulator[/url], an interactive interface that simulates presentation to better understand how it is affected by different variables under various conditions.
Expand Down
37 changes: 37 additions & 0 deletions doc/classes/RenderingServer.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1592,6 +1592,19 @@
Tries to free an object in the RenderingServer. To avoid memory leaks, this should be called after using an object as memory management does not occur automatically when using RenderingServer directly.
</description>
</method>
<method name="get_actual_cpu_gpu_sync_mode" qualifiers="const">
<return type="int" enum="RenderingServer.CPUGPUSyncMode" />
<description>
See [constant Performance.FRAME_PACING_ACTUAL_SYNC_MODE].
</description>
</method>
<method name="get_cpu_gpu_sync_mode" qualifiers="const">
<return type="int" enum="RenderingServer.CPUGPUSyncMode" />
<description>
Gets the CPU/GPU synchronization mode for the renderer. See [enum CPUGPUSyncMode] for options.
[b]Note:[/b] This setting should be used with care, as the sequential low latency mode prevents the CPU and GPU from running in parallel, which may adversely affect performance by as much as 30-50%. For simple scenes with capped FPS, this may not be noticeable. In other scenarios, the performance impact is more severe. An FPS drop, e.g. from 1000 to 500 FPS, or from 90 to 45 FPS, is an expected outcome from using sequential mode. Therefore, it is recommended that this option is exposed in your game's settings menu, so that users can select the best mode based on their preferences and hardware.
</description>
</method>
<method name="get_current_rendering_driver_name" qualifiers="const">
<return type="String" />
<description>
Expand Down Expand Up @@ -3417,6 +3430,14 @@
Sets a boot image. The color defines the background color. If [param scale] is [code]true[/code], the image will be scaled to fit the screen size. If [param use_filter] is [code]true[/code], the image will be scaled with linear interpolation. If [param use_filter] is [code]false[/code], the image will be scaled with nearest-neighbor interpolation.
</description>
</method>
<method name="set_cpu_gpu_sync_mode">
<return type="void" />
<param index="0" name="sync_mode" type="int" enum="RenderingServer.CPUGPUSyncMode" />
<description>
Sets the CPU/GPU synchronization mode for the renderer. See [enum CPUGPUSyncMode] for options. Equivalent to [member ProjectSettings.rendering/rendering_device/vsync/cpu_gpu_sync].
[b]Note:[/b] This setting should be used with care, as the sequential low latency mode prevents the CPU and GPU from running in parallel, which may adversely affect performance by as much as 30-50%. For simple scenes with capped FPS, this may not be noticeable. In other scenarios, the performance impact is more severe. An FPS drop, e.g. from 1000 to 500 FPS, or from 90 to 45 FPS, is an expected outcome from using sequential mode. Therefore, it is recommended that this option is exposed in your game's settings menu, so that users can select the best mode based on their preferences and hardware.
</description>
</method>
<method name="set_debug_generate_wireframes">
<return type="void" />
<param index="0" name="generate" type="bool" />
Expand Down Expand Up @@ -5879,6 +5900,22 @@
<constant name="GLOBAL_VAR_TYPE_MAX" value="29" enum="GlobalShaderParameterType">
Represents the size of the [enum GlobalShaderParameterType] enum.
</constant>
<constant name="CPU_GPU_SYNC_AUTO" value="0" enum="CPUGPUSyncMode">
Monitors CPU &amp; GPU timings and dynamically decides whether to use [constant CPU_GPU_SYNC_PARALLEL] or [constant CPU_GPU_SYNC_SEQUENTIAL]. Recommended for applications that want to take advantage of the lower latency provided by SEQUENTIAL when the system is fast enough to sustain it.
See [constant Performance.FRAME_PACING_TOTAL_TIME] and family of constants to monitor the measurements used to decide which mode to select.
[b]Note:[/b] When using these monitors, it's best to set the Editor to a simple view like the Script tab to avoid the 2D/3D view from consuming system resources that could interfere with readings. Or better yet, run the Editor profiler in another machine.
[b]Note:[/b] Stutter can be reduced if using [constant DisplayServer.VSYNC_ADAPTIVE]. But it risks always degenerating to [constant DisplayServer.VSYNC_DISABLED] if the system is too slow.
[b]Note:[/b] This mode always decides to use PARALLEL if V-Sync is set to either [constant DisplayServer.VSYNC_DISABLED] or [constant DisplayServer.VSYNC_MAILBOX].
</constant>
<constant name="CPU_GPU_SYNC_PARALLEL" value="1" enum="CPUGPUSyncMode">
Tells the renderer to prioritize higher framerate by allowing the CPU to queue up additional frames before they're rendered by the GPU. This allows the CPU and GPU to work in tandem, improving the framerate and framepacing in complex scenes at the expense of input latency. This default setting is suitable for most 3D applications, especially on mobile and lower-performance desktop hardware.
</constant>
<constant name="CPU_GPU_SYNC_SEQUENTIAL" value="2" enum="CPUGPUSyncMode">
Tells the renderer to prioritize lower display latency by severely limiting how far the CPU is allowed to get ahead of the GPU when queuing frames. This can greatly help with input lag, at the cost of significantly reduced framerate in most scenes. This setting is useful for games and applications with simple graphics where responsive input is important. Your results may vary based on platform, drivers, and scene contents.
[b]Note:[/b] Important FPS drops are expected while in this mode. It prioritizes low latency over framerate.
[b]Note:[/b] Explicitly setting this value is not possible in production. Godot only enters this mode through [constant CPU_GPU_SYNC_AUTO]. Lower latency is not always guaranteed, as the system may not be fast enough and ends up missing V-BLANK intervals, thus increasing latency and end up losing framerate for no improvement (or it may even be worse). Hence CPU_GPU_SYNC_AUTO only enters this mode when it deems safe to do so. Testing environments (e.g. Editor) allow setting this mode explicitly in order to debug any issues while in SEQUENTIAL/PARALLEL.
[b]Note:[/b] Stutter can be reduced if using [constant DisplayServer.VSYNC_ADAPTIVE]. But it risks always degenerating to [constant DisplayServer.VSYNC_DISABLED] if the system is too slow.
</constant>
<constant name="RENDERING_INFO_TOTAL_OBJECTS_IN_FRAME" value="0" enum="RenderingInfo">
Number of objects rendered in the current 3D scene. This varies depending on camera position and rotation.
</constant>
Expand Down
14 changes: 11 additions & 3 deletions drivers/gles3/rasterizer_gles3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,14 +112,22 @@ void RasterizerGLES3::begin_frame(double frame_step) {
//scene->iteration();
}

void RasterizerGLES3::end_frame(bool p_swap_buffers) {
void RasterizerGLES3::end_frame(bool p_swap_buffers, bool p_sequential_sync) {
GLES3::Utilities *utils = GLES3::Utilities::get_singleton();
utils->capture_timestamps_end();
}

void RasterizerGLES3::gl_end_frame(bool p_swap_buffers) {
void RasterizerGLES3::gl_end_frame(bool p_swap_buffers, bool p_sequential_sync) {
static const GLuint64 CLIENT_WAIT_SYNC_TIMEOUT = 1'000'000'000; // One second

if (p_swap_buffers) {
DisplayServer::get_singleton()->swap_buffers();

if (p_sequential_sync) {
GLsync fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
glClientWaitSync(fence, GL_SYNC_FLUSH_COMMANDS_BIT, CLIENT_WAIT_SYNC_TIMEOUT);
glDeleteSync(fence);
}
} else {
glFinish();
}
Expand Down Expand Up @@ -519,7 +527,7 @@ void RasterizerGLES3::set_boot_image(const Ref<Image> &p_image, const Color &p_c
copy_effects->copy_to_rect(screenrect);
glBindTexture(GL_TEXTURE_2D, 0);

gl_end_frame(true);
gl_end_frame(true, false);

texture_storage->texture_free(texture);
}
Expand Down
4 changes: 2 additions & 2 deletions drivers/gles3/rasterizer_gles3.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ class RasterizerGLES3 : public RendererCompositor {
void blit_render_targets_to_screen(DisplayServer::WindowID p_screen, const BlitToScreen *p_render_targets, int p_amount);

bool is_opengl() { return true; }
void gl_end_frame(bool p_swap_buffers);
void end_frame(bool p_swap_buffers);
void gl_end_frame(bool p_swap_buffers, bool p_sequential_sync);
void end_frame(bool p_swap_buffers, bool p_sequential_sync);

void finalize();

Expand Down
6 changes: 6 additions & 0 deletions drivers/gles3/storage/utilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,12 @@ void Utilities::capture_timestamp(const String &p_name) {
frames[frame].timestamp_count++;
}

void Utilities::capture_timestamps_sync_mode_auto_end() {
if (RenderingServer::get_singleton()->get_cpu_gpu_sync_mode() == RenderingServer::CPU_GPU_SYNC_AUTO) {
capture_timestamp("_Sync Mode Auto");
}
}

void Utilities::_capture_timestamps_begin() {
// frame is incremented at the end of the frame so this gives us the queries for frame - 2. By then they should be ready.
if (frames[frame].timestamp_count) {
Expand Down
1 change: 1 addition & 0 deletions drivers/gles3/storage/utilities.h
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ class Utilities : public RendererUtilities {

virtual void capture_timestamps_begin() override;
virtual void capture_timestamp(const String &p_name) override;
virtual void capture_timestamps_sync_mode_auto_end() override;
virtual uint32_t get_captured_timestamps_count() const override;
virtual uint64_t get_captured_timestamps_frame() const override;
virtual uint64_t get_captured_timestamp_gpu_time(uint32_t p_index) const override;
Expand Down
6 changes: 2 additions & 4 deletions editor/debugger/editor_visual_profiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -351,10 +351,8 @@ void EditorVisualProfiler::_update_frame(bool p_focus_selected) {

float cpu_time = m.areas[i].cpu_time;
float gpu_time = m.areas[i].gpu_time;
if (i < m.areas.size() - 1) {
cpu_time = m.areas[i + 1].cpu_time - cpu_time;
gpu_time = m.areas[i + 1].gpu_time - gpu_time;
}
cpu_time = m.areas[i + 1].cpu_time - cpu_time;
gpu_time = m.areas[i + 1].gpu_time - gpu_time;

if (name.begins_with(">")) {
TreeItem *category = variables->create_item(parent);
Expand Down
Loading