-
Notifications
You must be signed in to change notification settings - Fork 2
Improve rendering of DAG preview #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve rendering of DAG preview #131
Conversation
Signed-off-by: ErikDanielsson <[email protected]>
Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've gone through the code and tailored it to my liking. I think it's basically ready to merge, but wanted to give you a chance to look over everything and make sure I didn't break something.
If you have any test cases lying around that you wrote during development, it would be good to copy them here so that I can add them to the test suite. I set up a test harness so that we only need to specify input -> output pairs for each test case
visited.addAll(preds); | ||
preds = preds.stream() | ||
.flatMap(p -> | ||
isHidden(p, inputs, outputs) | ||
? p.preds.stream() | ||
: Stream.of(p) | ||
) | ||
.flatMap(pred -> ( | ||
isHidden(pred, inputs, outputs) | ||
? pred.preds.stream().filter(p -> !visited.contains(p)) | ||
: Stream.of(pred) | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to keep the original algorithm for skipping nodes instead of your collapse algorithm, since this one is simpler and seems to get the job done, especially since I removed the null node (for now).
I was testing fetchngs and found that the SRA
workflow causes an infinite loop here because of some circular dependency in the graph. It has something to do with ch_versions
but I never found the exact cause. So I just added the visited set (similar to your approach) to handle cycles and now fetchngs works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point of using a DFS is the difference in complexity: the DFS approach has complexity O(V + E)
while the approach you changed to now should have complexity O(V * (V + E))
since we are potentially reprocessing much of the graph for each node. However, since the graphs are rather small this should be completly fine, and I agree that it is easier to read the code.
If you don't have a visited set then the algorithm has worst case exponential complexity since your are enumerating all paths to the predecessors. I think that this might explain the apparent "infinite loop" behaviour for fetchngs
. Truly getting stuck in a loop should be impossible since the graph we are generating is by definition acyclic, so if there is a bug causing a cycle then this is more troubling. However, looking at the "debug graph" with variables shown this does not seem to be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some debug logging and observed that the predecessor set was cycling between two nodes when rendering fetchngs. I thought maybe I introduced a bug in the if-else logic, but when I rendered the graph with no collapsing, there was no cycle 🤷
I forgot about the trade-off in complexity, that's a fair point. But I agree that these graphs are small enough that it shouldn't matter, especially now with the visited set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param ifScope | ||
* @param elseScope | ||
*/ | ||
public void mergeConditionalScopes(Map<String,Variable> ifScope, Map<String,Variable> elseScope) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to move the conditional scope logic into a separate class to keep things modular. I hope I'll be able to adapt or even re-use this class to implement definite assignment analysis in the name checker, but we'll see.
Anyway, you might want to look at this method in particular to make sure I'm still merging the conditional scopes correctly. We should probably add some unit tests for the most common cases that we care about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good, but I'll have a careful read through it to ensure that it works as expected.
With regards to the indefinite assignment code, this is exactly what I introduced with the null nodes, so you should hopefully be able to reuse that code too!
Thanks for looking through and refactoring the code! Splitting things up makes a lot of sense, the I do not have any well thought through test cases, mainly just minor example, but I can write some if you want to. I've had a quick look over your changes now, here are some things that I think could be improved, or at least further discussed:
include { M } from "./main.nf";
workflow {
echo = params.echo ? 1 : 0
M(echo)
} is rendered as flowchart TB
subgraph " "
subgraph params
v0["echo"]
end
v1{ }
v3([M])
subgraph s1[" "]
end
subgraph s2[" "]
end
v0 --> v1
v1 --> s1
v1 --> s2
end
but both conditional branches should clearly feed into Looking in "debug mode" where variables are shown, the error seems to be that the variable flowchart TB
subgraph " "
subgraph params
v0["echo"]
end
v1{ }
v3([M])
subgraph s1[" "]
end
subgraph s2[" "]
end
v0 --> v1
v2 --> v3
v1 --> s1
v1 --> s2
end
I think that the cleanest approach to solve this is to add a node type corresponding to the full ternary expression (or in more generality a "return value" but I don't think Nextflow has this in any other places) node type, which the return value of both branches feed into. Nodes of this type should then likely be hidden during a normal render. I'll look into fixing this if you want to.
The following code for example include { M } from "./main.nf";
workflow {
t = 1
echo = t ? 1 : 0
M(1)
} is rendered as flowchart TB
subgraph " "
v1{ }
v3([M])
subgraph s1[" "]
end
subgraph s2[" "]
end
v1 --> s1
v1 --> s2
end
which is correct, but hard to understand when all variable nodes are collapsed. If you agree, I could add some code that removes the disconnected components. |
src/main/java/nextflow/lsp/services/script/dag/MermaidRenderer.java
Outdated
Show resolved
Hide resolved
Thanks for the suggestions. I think they make sense, I will try to implement them so that you can stay focused on the formatter. |
This PR makes quite substantial changes to the DAG preview feature of the language server. First and foremost it adds support for rendering control flow correctly (issues #93 #88).
The solution to these issues is to use scoped symbol tables where each name in the program can correspond to several nodes in the control flow DAG. As we enter a new conditional statement each of the two branches inherits the symbols from the outer scope. Once we leave the conditional statement the symbol table for the outer scope is the union of the symbol table for each branch, minus any local variables declared.
To have separate plates for separate branches I have also added subgraphs for each conditional branch. The conditional node (corresponding to the boolean in the if) is always a predecessor. Since an a conditional branch can contain a conditional branch, the subgraphs are represented as a tree structure and rendered accordingly.
Example: if we call the symbol table
s
then we have the followingwhich will be rendered as the DAG
If a global variable is assigned in one branch but not the other then it is implicitly initialized with
null
. To address this, we check before entering the outside scope whether this was the case and add anull
node to the scope where the variable is not declared.This means that the code:
which will be rendered as
Removing nodes that shouldn't be rendered
By default we only want to render inputs, processes/(sub)workflows and outputs. This means that we need to collapse the rest of the graph. There are two types of nodes we remove:
The second case is new and correspond to the situation where a variable is is initalized in one branch but never used outside the branch, which will create a disconnected null in the other branch. We remove nodes in place via a DFS in the
Graph.collapseGraph
function to which we pass predicates for nodes that should always be hidden (variable nodes) and nodes that should be hidden if they become disconnected (null nodes).Configuration support for rendering variable nodes
I have also added the configuration option
nextflow.dag.showVariables
which toggles whether variables should be shown (or if the graph should be collapsed). It isfalse
by default. I added it since it could be (was for me) useful for debugging purposes.If this is not desired we can revert the changes in
287324c
. Here is the corresponding PR on thevscode-code-language
side.