Skip to content

C++: Fix missing global variable flow #20126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 11, 2025

Conversation

MathiasVP
Copy link
Contributor

This PR fixes a family of missing flow cases involving global variables.

First some background: We implement global variable flow by adding a "final use" at the exit of each function that writes to the global variable, and an "initial definition" at the entry of each function that reads the global variable. For example, for something like:

int global;

void set() {
  global = source();
}

void get() {
  sink(global);
}

we will generate a "final use" at the end of the set function (which represents the value of global as we're exiting the function), and an "initial definition" at the beginning of the get function (which represents the value of global as we're starting to execute the function).

This PR also adds a "final use" of a global variable when there is a post-update node for the variable. This covers situations like:

  • The global variable is a struct and a function writes to one of its fields, or
  • The global variable is passed to a function which writes to the parameter.

For example, this now works:

int global;

void set_param(int* p) {
  *p = source();
}

void set_global() {
  set_param(&global); // this now generates a "final use of global" even though there's no `StoreInstruction` that writes to the address of `global`
}

void read_global() {
  sink(global);
}

DCA does show a slowdown on two projects: vim (which we've come to learn is infamous for their use of global variables), and php. However, on average this is less than a 2.5% performance slowdown across all the projects.

I also ran QA which also showed that vim and php are clear outliers. QA showed no new timeouts.

@MathiasVP MathiasVP force-pushed the fix-missing-global-flow branch from 849454c to 163d4af Compare August 2, 2025 15:41
@MathiasVP MathiasVP force-pushed the fix-missing-global-flow branch from 163d4af to fca49dd Compare August 2, 2025 15:43
@jketema
Copy link
Contributor

jketema commented Aug 11, 2025

What about the new alerts that DCA shows?

@MathiasVP
Copy link
Contributor Author

I looked at about ~20 results (~10 cpp/path-injections on git, and ~10 cpp/unbounded-write ones on vim and php). In all the cases it was caused by a global variable with a struct type with a field being passed into functions and then written to in the caller like this:

struct S { int x, y; };

S s;

void set(int* x) {
  *x = tainted_data();
}

void foo() {
  set(&s.x);
}

void bar() {
  int x = s.x;
  // use x
}

(of course behind numerous macro expansions because vim 😭) It all looked genuine to me.

jketema
jketema previously approved these changes Aug 11, 2025
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MathiasVP MathiasVP marked this pull request as ready for review August 11, 2025 09:21
@MathiasVP MathiasVP requested a review from a team as a code owner August 11, 2025 09:21
@Copilot Copilot AI review requested due to automatic review settings August 11, 2025 09:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes missing global variable flow cases by improving how the C++ data flow analysis tracks global variables. The implementation adds "final use" nodes when there are post-update nodes for global variables, covering scenarios where global variables are modified indirectly through function parameters or field assignments.

  • Adds post-update node tracking for global variables modified through function calls or field updates
  • Creates additional flow edges for global variables when they are modified indirectly
  • Introduces new test cases to validate the improved global variable flow detection

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImpl.qll Extends global use detection to include post-update nodes
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowUtil.qll Refactors post-update node creation logic
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowPrivate.qll Extracts post-update node predicate
cpp/ql/test/library-tests/dataflow/fields/aliasing.cpp Adds test cases for global field flow scenarios
cpp/ql/test/library-tests/dataflow/dataflow-tests/test.cpp Adds test cases for globals without explicit definitions
*.expected files Updates expected test results with new flow edges

@jketema jketema merged commit f9f99a0 into github:main Aug 11, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants