Improve rendering of DAG preview #131

ErikDanielsson · 2025-07-15T14:35:01Z

This PR makes quite substantial changes to the DAG preview feature of the language server. First and foremost it adds support for rendering control flow correctly (issues #93 #88).

The solution to these issues is to use scoped symbol tables where each name in the program can correspond to several nodes in the control flow DAG. As we enter a new conditional statement each of the two branches inherits the symbols from the outer scope. Once we leave the conditional statement the symbol table for the outer scope is the union of the symbol table for each branch, minus any local variables declared.

To have separate plates for separate branches I have also added subgraphs for each conditional branch. The conditional node (corresponding to the boolean in the if) is always a predecessor. Since an a conditional branch can contain a conditional branch, the subgraphs are represented as a tree structure and rendered accordingly.

Example: if we call the symbol table s then we have the following

workflow {
    test  = params.echo // s = {'test': {<Node>("test", ...) , 0)>}
    if (test) {
        echo = TOUCH(params.echo) // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 1)>}
    } else {
        echo = DEFAULT() // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 2)>}
    }
    // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 1)>, <Node('echo', 2)>}
   // Overwrite test variable
    test  = APPEND(echo) // s = {'test': {<Node(params.echo, 1)>, 'echo': {<Node('echo', 1)>, <Node('echo', 2)>}
}

which will be rendered as the DAG

flowchart TB
subgraph " "
  subgraph params
    v0["echo"]
  end
  v0["echo"]
  v2([conditional])
  v7([APPEND])
  subgraph s1[" "]
    v3([TOUCH])
  end
  subgraph s2[" "]
    v5([DEFAULT])
  end
    v0 --> v2
    v0 --> v3
    v3 --> v7
    v5 --> v7
    v2 --> s2
    v2 --> s1
end

If a global variable is assigned in one branch but not the other then it is implicitly initialized with null. To address this, we check before entering the outside scope whether this was the case and add a null node to the scope where the variable is not declared.
This means that the code:

workflow {
    if (params.echo) {
        echo = TOUCH(params.echo)
    } 
    APPEND(echo) 
}

which will be rendered as

flowchart TB
subgraph " "
  subgraph params
    v0["echo"]
  end
  v0["echo"]
  v1([conditional])
  v4{null}
  v6([APPEND])
  subgraph s1[" "]
    v2([TOUCH])
  end
    v0 --> v1
    v0 --> v2
    v2 --> v6
    v4 --> v6
    v1 --> s1
end

Removing nodes that shouldn't be rendered

By default we only want to render inputs, processes/(sub)workflows and outputs. This means that we need to collapse the rest of the graph. There are two types of nodes we remove:

Variable nodes
Disconnected null nodes

The second case is new and correspond to the situation where a variable is is initalized in one branch but never used outside the branch, which will create a disconnected null in the other branch. We remove nodes in place via a DFS in the Graph.collapseGraph function to which we pass predicates for nodes that should always be hidden (variable nodes) and nodes that should be hidden if they become disconnected (null nodes).

Configuration support for rendering variable nodes

I have also added the configuration option nextflow.dag.showVariables which toggles whether variables should be shown (or if the graph should be collapsed). It is false by default. I added it since it could be (was for me) useful for debugging purposes.

If this is not desired we can revert the changes in 287324c. Here is the corresponding PR on the vscode-code-language side.

Signed-off-by: ErikDanielsson <[email protected]>

bentsherman · 2025-07-30T18:00:14Z

Updates:

revert unrelated formatting changes
remove config setting for variables (kept as internal constant for testing)
move conditional scope logic into separate class VariableContext
move Graph, Subgraph, Node to separate file
remove null constant (will be handled by upstream error checking later)
refactor MermaidRenderer with StringBuilder
reverted collapseGraph in favor of original algorithm for skipping hidden nodes
add unit tests

bentsherman

Okay, I've gone through the code and tailored it to my liking. I think it's basically ready to merge, but wanted to give you a chance to look over everything and make sure I didn't break something.

If you have any test cases lying around that you wrote during development, it would be good to copy them here so that I can add them to the test suite. I set up a test harness so that we only need to specify input -> output pairs for each test case

src/main/java/nextflow/lsp/services/script/dag/DataflowVisitor.java

bentsherman · 2025-07-30T18:13:59Z

src/main/java/nextflow/lsp/services/script/dag/MermaidRenderer.java

+                visited.addAll(preds);
                preds = preds.stream()
-                    .flatMap(p ->
-                        isHidden(p, inputs, outputs)
-                            ? p.preds.stream()
-                            : Stream.of(p)
-                    )
+                    .flatMap(pred -> (
+                        isHidden(pred, inputs, outputs)
+                            ? pred.preds.stream().filter(p -> !visited.contains(p))
+                            : Stream.of(pred)
+                    ))


I decided to keep the original algorithm for skipping nodes instead of your collapse algorithm, since this one is simpler and seems to get the job done, especially since I removed the null node (for now).

I was testing fetchngs and found that the SRA workflow causes an infinite loop here because of some circular dependency in the graph. It has something to do with ch_versions but I never found the exact cause. So I just added the visited set (similar to your approach) to handle cycles and now fetchngs works.

The main point of using a DFS is the difference in complexity: the DFS approach has complexity O(V + E) while the approach you changed to now should have complexity O(V * (V + E)) since we are potentially reprocessing much of the graph for each node. However, since the graphs are rather small this should be completly fine, and I agree that it is easier to read the code.

If you don't have a visited set then the algorithm has worst case exponential complexity since your are enumerating all paths to the predecessors. I think that this might explain the apparent "infinite loop" behaviour for fetchngs. Truly getting stuck in a loop should be impossible since the graph we are generating is by definition acyclic, so if there is a bug causing a cycle then this is more troubling. However, looking at the "debug graph" with variables shown this does not seem to be the case.

I did some debug logging and observed that the predecessor set was cycling between two nodes when rendering fetchngs. I thought maybe I introduced a bug in the if-else logic, but when I rendered the graph with no collapsing, there was no cycle 🤷

I forgot about the trade-off in complexity, that's a fair point. But I agree that these graphs are small enough that it shouldn't matter, especially now with the visited set

I just found the cycle in fetchngs:

It has something to do with ch_versions being reassigned multiple times. Not sure if it was introduced by your changes or mine, but I suspect something is wrong with the if-else scope merging

bentsherman · 2025-07-30T18:17:47Z

src/main/java/nextflow/lsp/services/script/dag/VariableContext.java

+     * @param ifScope
+     * @param elseScope
+     */
+    public void mergeConditionalScopes(Map<String,Variable> ifScope, Map<String,Variable> elseScope) {


I was able to move the conditional scope logic into a separate class to keep things modular. I hope I'll be able to adapt or even re-use this class to implement definite assignment analysis in the name checker, but we'll see.

Anyway, you might want to look at this method in particular to make sure I'm still merging the conditional scopes correctly. We should probably add some unit tests for the most common cases that we care about

I think it looks good, but I'll have a careful read through it to ensure that it works as expected.

With regards to the indefinite assignment code, this is exactly what I introduced with the null nodes, so you should hopefully be able to reuse that code too!

ErikDanielsson · 2025-07-31T10:26:28Z

Thanks for looking through and refactoring the code! Splitting things up makes a lot of sense, the DataflowVisitor file was way too crowded.

I do not have any well thought through test cases, mainly just minor example, but I can write some if you want to.

I've had a quick look over your changes now, here are some things that I think could be improved, or at least further discussed:

Rendering of ternary expression
I think there is some error with the rendering of ternary expressions, since for example the workflow

include { M } from "./main.nf";
workflow {
    echo = params.echo ? 1 : 0
    M(echo)
}

is rendered as

flowchart TB
  subgraph " "
    subgraph params
      v0["echo"]
    end
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v0 --> v1
    v1 --> s1
    v1 --> s2
  end

but both conditional branches should clearly feed into M.

Looking in "debug mode" where variables are shown, the error seems to be that the variable echo is not assigned by the return value of the ternary expression.

flowchart TB
  subgraph " "
    subgraph params
      v0["echo"]
    end
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v0 --> v1
    v2 --> v3
    v1 --> s1
    v1 --> s2
  end

I think that the cleanest approach to solve this is to add a node type corresponding to the full ternary expression (or in more generality a "return value" but I don't think Nextflow has this in any other places) node type, which the return value of both branches feed into. Nodes of this type should then likely be hidden during a normal render. I'll look into fixing this if you want to.

Removing disconnect components
Since ternary expressions and if-else-statements can manipulate code that does not affect the actual outputs of the workflow I think it is reasonable to remove subgraphs that are disconnected from any process, input or output.

The following code for example

include { M } from "./main.nf";
workflow {
    t = 1
    echo = t ? 1 : 0
    M(1)
}

is rendered as

flowchart TB
  subgraph " "
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v1 --> s1
    v1 --> s2
  end

which is correct, but hard to understand when all variable nodes are collapsed. If you agree, I could add some code that removes the disconnected components.

src/main/java/nextflow/lsp/services/script/dag/MermaidRenderer.java

bentsherman · 2025-07-31T15:17:43Z

Thanks for the suggestions. I think they make sense, I will try to implement them so that you can stay focused on the formatter.

ErikDanielsson added 6 commits July 10, 2025 10:57

Bump Nextflow version

8dc77dd

Signed-off-by: ErikDanielsson <[email protected]>

First working version

f0e7863

Fix local variables

14b67f3

Add configuration for showing variables or not

287324c

Add better support for collapsing graph

737082b

Add myself as @author (?) and move some things around

ffec1f5

ErikDanielsson mentioned this pull request Jul 15, 2025

Add configuration option nextflow.dag.showVariables to toggle rendering of variable nodes in DAG preview nextflow-io/vscode-language-nextflow#148

Closed

This was linked to issues Jul 16, 2025

Improve rendering of conditional blocks in DAG preview #88

Open

Missing connection in DAG for conditional execution #93

Open

cleanup

fe057cd

bentsherman reviewed Jul 30, 2025

View reviewed changes

bentsherman added 4 commits July 30, 2025 13:23

minor edits

c48a91e

Merge branch 'main' into pr/ErikDanielsson/131

c2faa30

Support ternary expression

8985078

fix flaky test, cleanup

c790455

ErikDanielsson commented Jul 31, 2025

View reviewed changes

src/main/java/nextflow/lsp/services/script/dag/MermaidRenderer.java Outdated Show resolved Hide resolved

bentsherman added 2 commits July 31, 2025 10:43

Fix rendering of variable nodes

75f6c77

Add configuration settings for direction and verbose

4559d4b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve rendering of DAG preview #131

Improve rendering of DAG preview #131

Uh oh!

ErikDanielsson commented Jul 15, 2025 •

edited

Loading

Uh oh!

bentsherman commented Jul 30, 2025

Uh oh!

bentsherman left a comment

Uh oh!

Uh oh!

bentsherman Jul 30, 2025

Uh oh!

ErikDanielsson Jul 31, 2025 •

edited

Loading

Uh oh!

bentsherman Jul 31, 2025

Uh oh!

bentsherman Jul 31, 2025

Uh oh!

bentsherman Jul 30, 2025

Uh oh!

ErikDanielsson Jul 31, 2025

Uh oh!

ErikDanielsson commented Jul 31, 2025

Uh oh!

Uh oh!

bentsherman commented Jul 31, 2025

Uh oh!

Uh oh!

Improve rendering of DAG preview #131

Are you sure you want to change the base?

Improve rendering of DAG preview #131

Uh oh!

Conversation

ErikDanielsson commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Removing nodes that shouldn't be rendered

Configuration support for rendering variable nodes

Uh oh!

bentsherman commented Jul 30, 2025

Uh oh!

bentsherman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bentsherman Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

ErikDanielsson Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bentsherman Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

bentsherman Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

bentsherman Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

ErikDanielsson Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

ErikDanielsson commented Jul 31, 2025

Uh oh!

Uh oh!

bentsherman commented Jul 31, 2025

Uh oh!

Uh oh!

ErikDanielsson commented Jul 15, 2025 •

edited

Loading

ErikDanielsson Jul 31, 2025 •

edited

Loading