Skip to content

Improve rendering of DAG preview #131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 5, 2025

Conversation

ErikDanielsson
Copy link
Contributor

@ErikDanielsson ErikDanielsson commented Jul 15, 2025

This PR makes quite substantial changes to the DAG preview feature of the language server. First and foremost it adds support for rendering control flow correctly (issues #93 #88).

The solution to these issues is to use scoped symbol tables where each name in the program can correspond to several nodes in the control flow DAG. As we enter a new conditional statement each of the two branches inherits the symbols from the outer scope. Once we leave the conditional statement the symbol table for the outer scope is the union of the symbol table for each branch, minus any local variables declared.

To have separate plates for separate branches I have also added subgraphs for each conditional branch. The conditional node (corresponding to the boolean in the if) is always a predecessor. Since an a conditional branch can contain a conditional branch, the subgraphs are represented as a tree structure and rendered accordingly.

Example: if we call the symbol table s then we have the following

workflow {
    test  = params.echo // s = {'test': {<Node>("test", ...) , 0)>}
    if (test) {
        echo = TOUCH(params.echo) // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 1)>}
    } else {
        echo = DEFAULT() // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 2)>}
    }
    // s = {'test': {<Node(params.echo, 0)>, 'echo': {<Node('echo', 1)>, <Node('echo', 2)>}
   // Overwrite test variable
    test  = APPEND(echo) // s = {'test': {<Node(params.echo, 1)>, 'echo': {<Node('echo', 1)>, <Node('echo', 2)>}
}

which will be rendered as the DAG

flowchart TB
subgraph " "
  subgraph params
    v0["echo"]
  end
  v0["echo"]
  v2([conditional])
  v7([APPEND])
  subgraph s1[" "]
    v3([TOUCH])
  end
  subgraph s2[" "]
    v5([DEFAULT])
  end
    v0 --> v2
    v0 --> v3
    v3 --> v7
    v5 --> v7
    v2 --> s2
    v2 --> s1
end
Loading

If a global variable is assigned in one branch but not the other then it is implicitly initialized with null. To address this, we check before entering the outside scope whether this was the case and add a null node to the scope where the variable is not declared.
This means that the code:

workflow {
    if (params.echo) {
        echo = TOUCH(params.echo)
    } 
    APPEND(echo) 
}

which will be rendered as

flowchart TB
subgraph " "
  subgraph params
    v0["echo"]
  end
  v0["echo"]
  v1([conditional])
  v4{null}
  v6([APPEND])
  subgraph s1[" "]
    v2([TOUCH])
  end
    v0 --> v1
    v0 --> v2
    v2 --> v6
    v4 --> v6
    v1 --> s1
end
Loading

Removing nodes that shouldn't be rendered

By default we only want to render inputs, processes/(sub)workflows and outputs. This means that we need to collapse the rest of the graph. There are two types of nodes we remove:

  • Variable nodes
  • Disconnected null nodes

The second case is new and correspond to the situation where a variable is is initalized in one branch but never used outside the branch, which will create a disconnected null in the other branch. We remove nodes in place via a DFS in the Graph.collapseGraph function to which we pass predicates for nodes that should always be hidden (variable nodes) and nodes that should be hidden if they become disconnected (null nodes).

Configuration support for rendering variable nodes

I have also added the configuration option nextflow.dag.showVariables which toggles whether variables should be shown (or if the graph should be collapsed). It is false by default. I added it since it could be (was for me) useful for debugging purposes.

If this is not desired we can revert the changes in 287324c. Here is the corresponding PR on the vscode-code-language side.

@bentsherman
Copy link
Member

Updates:

  • revert unrelated formatting changes
  • remove config setting for variables (kept as internal constant for testing)
  • move conditional scope logic into separate class VariableContext
  • move Graph, Subgraph, Node to separate file
  • remove null constant (will be handled by upstream error checking later)
  • refactor MermaidRenderer with StringBuilder
  • reverted collapseGraph in favor of original algorithm for skipping hidden nodes
  • add unit tests

Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've gone through the code and tailored it to my liking. I think it's basically ready to merge, but wanted to give you a chance to look over everything and make sure I didn't break something.

If you have any test cases lying around that you wrote during development, it would be good to copy them here so that I can add them to the test suite. I set up a test harness so that we only need to specify input -> output pairs for each test case

@ErikDanielsson
Copy link
Contributor Author

Thanks for looking through and refactoring the code! Splitting things up makes a lot of sense, the DataflowVisitor file was way too crowded.

I do not have any well thought through test cases, mainly just minor example, but I can write some if you want to.

I've had a quick look over your changes now, here are some things that I think could be improved, or at least further discussed:

  • Rendering of ternary expression
    I think there is some error with the rendering of ternary expressions, since for example the workflow
include { M } from "./main.nf";
workflow {
    echo = params.echo ? 1 : 0
    M(echo)
}

is rendered as

flowchart TB
  subgraph " "
    subgraph params
      v0["echo"]
    end
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v0 --> v1
    v1 --> s1
    v1 --> s2
  end

Loading

but both conditional branches should clearly feed into M.

Looking in "debug mode" where variables are shown, the error seems to be that the variable echo is not assigned by the return value of the ternary expression.

flowchart TB
  subgraph " "
    subgraph params
      v0["echo"]
    end
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v0 --> v1
    v2 --> v3
    v1 --> s1
    v1 --> s2
  end

Loading

I think that the cleanest approach to solve this is to add a node type corresponding to the full ternary expression (or in more generality a "return value" but I don't think Nextflow has this in any other places) node type, which the return value of both branches feed into. Nodes of this type should then likely be hidden during a normal render. I'll look into fixing this if you want to.

  • Removing disconnect components
    Since ternary expressions and if-else-statements can manipulate code that does not affect the actual outputs of the workflow I think it is reasonable to remove subgraphs that are disconnected from any process, input or output.

The following code for example

include { M } from "./main.nf";
workflow {
    t = 1
    echo = t ? 1 : 0
    M(1)
}

is rendered as

flowchart TB
  subgraph " "
    v1{ }
    v3([M])
    subgraph s1[" "]
    end
    subgraph s2[" "]
    end
    v1 --> s1
    v1 --> s2
  end
Loading

which is correct, but hard to understand when all variable nodes are collapsed. If you agree, I could add some code that removes the disconnected components.

@bentsherman
Copy link
Member

Thanks for the suggestions. I think they make sense, I will try to implement them so that you can stay focused on the formatter.

@ErikDanielsson
Copy link
Contributor Author

I updated the mergeConditionalScopes function now, since I saw that you removed the else branch logic, but you'll need this logic when doing the static checking for whether a variable is declared. I added some comments so it should hopefully be clear now.

@ErikDanielsson
Copy link
Contributor Author

I also checked your logic for hiding the conditional branch, and it seems to not work fully with the verbose mode. For example, the code

include { M } from "./main.nf"
workflow test {
    take:
    t

    main:
    if (M(t)) {
        t = 1
    }
    else {
        t = 2
    }
    M(t)
}

produces the output

flowchart TB
  subgraph test
    subgraph take
      v0["t"]
    end
    v1([M])
    v5["t"]
    v6([M])
    subgraph s1[" "]
      v3["t"]
    end
    subgraph s2[" "]
      v4["t"]
    end
    v0 --> v1
    v1 --> v5
    v3 --> v5
    v4 --> v5
    v5 --> v6
    v2 --> s1
    v2 --> s2
  end
Loading

@bentsherman
Copy link
Member

I actually need to add a check for that because it's an error to use the same process more than once in a workflow without an alias. Not sure if that's related to this issue though

@ErikDanielsson
Copy link
Contributor Author

True, and adding this should be relatively simple with the conditonal symbol tables. The above issue is unrelated though since at the moment each instantiation of the process creates a new node.

@bentsherman
Copy link
Member

It was an issue with the collapsing of "empty" if-else subgraphs which only occurred in verbose mode. It's a bit tricky because subgraph edges are handled differently from regular edges, also I haven't tested much with nested if-else

But I think this is good enough for now, and I can iterate as new cases come up. At the very least, fetchngs and rnaseq look much better now. Thanks for all your hard work!

@bentsherman bentsherman merged commit 9e35a28 into nextflow-io:main Aug 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing connection in DAG for conditional execution Improve rendering of conditional blocks in DAG preview
2 participants