Skip to content

Improve rendering of DAG preview #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ErikDanielsson
Copy link
Contributor

@ErikDanielsson ErikDanielsson commented Jul 15, 2025

This PR makes quite substantial changes to the DAG preview feature of the language server. First and foremost it adds support for rendering control flow correctly (issues #93 #88).

The solution to these issues is to use scoped symbol tables where each name in the program can correspond to several nodes in the control flow DAG. As we enter a new conditional statement each of the two branches inherits the symbols from the outer scope. Once we leave the conditional statement the symbol table for the outer scope is the union of the symbol table for each branch, minus any local variables declared.

To have separate plates for separate branches I have also added subgraphs for each conditional branch. The conditional node (corresponding to the boolean in the if) is always a predecessor. Since an a conditional branch can contain a conditional branch, the subgraphs are represented as a tree structure and rendered accordingly.

Example: if we call the symbol table s then we have the following

workflow {
    test  = params.echo // s = {'test': {<Node>("test", ...) , 0)>}
    if (test) {
        echo = TOUCH(params.echo) // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 1)>}
    } else {
        echo = DEFAULT() // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 2)>}
    }
    // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 1)>, <Node('echo', 2)>}
   // Overwrite test variable
    test  = APPEND(echo) // s = {'test': {<Node(params.echo, 1)>, 'echo': {<Node('echo', 1)>, <Node('echo', 2)>}
}

which will be rendered as the DAG

flowchart TB
subgraph " "
  subgraph params
    v0["echo"]
  end
  v0["echo"]
  v2([conditional])
  v7([APPEND])
  subgraph s1[" "]
    v3([TOUCH])
  end
  subgraph s2[" "]
    v5([DEFAULT])
  end
    v0 --> v2
    v0 --> v3
    v3 --> v7
    v5 --> v7
    v2 --> s2
    v2 --> s1
end
Loading

If a global variable is assigned in one branch but not the other then it is implicitly initialized with null. To address this, we check before entering the outside scope whether this was the case and add a null node to the scope where the variable is not declared.
This means that the code:

workflow {
    if (params.echo) {
        echo = TOUCH(params.echo)
    } 
    APPEND(echo) 
}

which will be rendered as

flowchart TB
subgraph " "
  subgraph params
    v0["echo"]
  end
  v0["echo"]
  v1([conditional])
  v4{null}
  v6([APPEND])
  subgraph s1[" "]
    v2([TOUCH])
  end
    v0 --> v1
    v0 --> v2
    v2 --> v6
    v4 --> v6
    v1 --> s1
end
Loading

Removing nodes that shouldn't be rendered

By default we only want to render inputs, processes/(sub)workflows and outputs. This means that we need to collapse the rest of the graph. There are two types of nodes we remove:

  • Variable nodes
  • Disconnected null nodes

The second case is new and correspond to the situation where a variable is is initalized in one branch but never used outside the branch, which will create a disconnected null in the other branch. We remove nodes in place via a DFS in the Graph.collapseGraph function to which we pass predicates for nodes that should always be hidden (variable nodes) and nodes that should be hidden if they become disconnected (null nodes).

Configuration support for rendering variable nodes

I have also added the configuration option nextflow.dag.showVariables which toggles whether variables should be shown (or if the graph should be collapsed). It is false by default. I added it since it could be (was for me) useful for debugging purposes.

If this is not desired we can revert the changes in 287324c. Here is the corresponding PR on the vscode-code-language side.

@bentsherman
Copy link
Member

Updates:

  • revert unrelated formatting changes
  • remove config setting for variables (kept as internal constant for testing)
  • move conditional scope logic into separate class VariableContext
  • move Graph, Subgraph, Node to separate file
  • remove null constant (will be handled by upstream error checking later)
  • refactor MermaidRenderer with StringBuilder
  • reverted collapseGraph in favor of original algorithm for skipping hidden nodes
  • add unit tests

Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've gone through the code and tailored it to my liking. I think it's basically ready to merge, but wanted to give you a chance to look over everything and make sure I didn't break something.

If you have any test cases lying around that you wrote during development, it would be good to copy them here so that I can add them to the test suite. I set up a test harness so that we only need to specify input -> output pairs for each test case

Comment on lines +104 to +110
visited.addAll(preds);
preds = preds.stream()
.flatMap(p ->
isHidden(p, inputs, outputs)
? p.preds.stream()
: Stream.of(p)
)
.flatMap(pred -> (
isHidden(pred, inputs, outputs)
? pred.preds.stream().filter(p -> !visited.contains(p))
: Stream.of(pred)
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep the original algorithm for skipping nodes instead of your collapse algorithm, since this one is simpler and seems to get the job done, especially since I removed the null node (for now).

I was testing fetchngs and found that the SRA workflow causes an infinite loop here because of some circular dependency in the graph. It has something to do with ch_versions but I never found the exact cause. So I just added the visited set (similar to your approach) to handle cycles and now fetchngs works.

Copy link
Contributor Author

@ErikDanielsson ErikDanielsson Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point of using a DFS is the difference in complexity: the DFS approach has complexity O(V + E) while the approach you changed to now should have complexity O(V * (V + E)) since we are potentially reprocessing much of the graph for each node. However, since the graphs are rather small this should be completly fine, and I agree that it is easier to read the code.

If you don't have a visited set then the algorithm has worst case exponential complexity since your are enumerating all paths to the predecessors. I think that this might explain the apparent "infinite loop" behaviour for fetchngs. Truly getting stuck in a loop should be impossible since the graph we are generating is by definition acyclic, so if there is a bug causing a cycle then this is more troubling. However, looking at the "debug graph" with variables shown this does not seem to be the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some debug logging and observed that the predecessor set was cycling between two nodes when rendering fetchngs. I thought maybe I introduced a bug in the if-else logic, but when I rendered the graph with no collapsing, there was no cycle 🤷

I forgot about the trade-off in complexity, that's a fair point. But I agree that these graphs are small enough that it shouldn't matter, especially now with the visited set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found the cycle in fetchngs:

image

It has something to do with ch_versions being reassigned multiple times. Not sure if it was introduced by your changes or mine, but I suspect something is wrong with the if-else scope merging

* @param ifScope
* @param elseScope
*/
public void mergeConditionalScopes(Map<String,Variable> ifScope, Map<String,Variable> elseScope) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to move the conditional scope logic into a separate class to keep things modular. I hope I'll be able to adapt or even re-use this class to implement definite assignment analysis in the name checker, but we'll see.

Anyway, you might want to look at this method in particular to make sure I'm still merging the conditional scopes correctly. We should probably add some unit tests for the most common cases that we care about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, but I'll have a careful read through it to ensure that it works as expected.

With regards to the indefinite assignment code, this is exactly what I introduced with the null nodes, so you should hopefully be able to reuse that code too!

@ErikDanielsson
Copy link
Contributor Author

Thanks for looking through and refactoring the code! Splitting things up makes a lot of sense, the DataflowVisitor file was way too crowded.

I do not have any well thought through test cases, mainly just minor example, but I can write some if you want to.

I've had a quick look over your changes now, here are some things that I think could be improved, or at least further discussed:

  • Rendering of ternary expression
    I think there is some error with the rendering of ternary expressions, since for example the workflow
include { M } from "./main.nf";
workflow {
    echo = params.echo ? 1 : 0
    M(echo)
}

is rendered as

flowchart TB
  subgraph " "
    subgraph params
      v0["echo"]
    end
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v0 --> v1
    v1 --> s1
    v1 --> s2
  end

Loading

but both conditional branches should clearly feed into M.

Looking in "debug mode" where variables are shown, the error seems to be that the variable echo is not assigned by the return value of the ternary expression.

flowchart TB
  subgraph " "
    subgraph params
      v0["echo"]
    end
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v0 --> v1
    v2 --> v3
    v1 --> s1
    v1 --> s2
  end

Loading

I think that the cleanest approach to solve this is to add a node type corresponding to the full ternary expression (or in more generality a "return value" but I don't think Nextflow has this in any other places) node type, which the return value of both branches feed into. Nodes of this type should then likely be hidden during a normal render. I'll look into fixing this if you want to.

  • Removing disconnect components
    Since ternary expressions and if-else-statements can manipulate code that does not affect the actual outputs of the workflow I think it is reasonable to remove subgraphs that are disconnected from any process, input or output.

The following code for example

include { M } from "./main.nf";
workflow {
    t = 1
    echo = t ? 1 : 0
    M(1)
}

is rendered as

flowchart TB
  subgraph " "
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v1 --> s1
    v1 --> s2
  end
Loading

which is correct, but hard to understand when all variable nodes are collapsed. If you agree, I could add some code that removes the disconnected components.

@bentsherman
Copy link
Member

Thanks for the suggestions. I think they make sense, I will try to implement them so that you can stay focused on the formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing connection in DAG for conditional execution Improve rendering of conditional blocks in DAG preview
2 participants