Skip to content

Clarification on Directed vs Undirected Graph Construction in _read_data of GraphPropertyReader #6

@aditya0by0

Description

@aditya0by0

While reviewing the GraphPropertyReader class, I noticed that the edge_index in the _read_data method is built like this:

edge_index = torch.tensor(
    [
        [bond.GetBeginAtomIdx() for bond in mol.GetBonds()],
        [bond.GetEndAtomIdx() for bond in mol.GetBonds()],
    ]
)

This seems to construct the molecular graph as directed graph, where each bond is added in one direction only [begin_atom_idx] -> [end_atom_idx].

❓ Clarification

  • Is this directed representation intentional?
  • Or is the graph converted to undirected later in the pipeline (perhaps via symmetrization)?

Since molecular graphs typically model bonds as undirected, I wanted to confirm the intended behavior.

💡 Suggestion (if not intentional)

If the goal is an undirected graph, this can be handled cleanly using PyTorch Geometric's to_undirected utility:

from torch_geometric.utils import to_undirected

edge_index = torch.tensor(
    [
        [bond.GetBeginAtomIdx() for bond in mol.GetBonds()],
        [bond.GetEndAtomIdx() for bond in mol.GetBonds()],
    ]
)
edge_index = to_undirected(edge_index)

OR

by explicitly creating undirected graphs:

src = [bond.GetBeginAtomIdx() for bond in mol.GetBonds()]
tgt = [bond.GetEndAtomIdx() for bond in mol.GetBonds()]
edge_index = torch.tensor([src + tgt, tgt + src])

Let me know if this was an intentional design or if we should update it. Thanks!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions