Fix to make Directed graph to Undirected #7

aditya0by0 · 2025-05-07T14:22:56Z

Clarification on Directed vs Undirected Graph Construction in _read_data of GraphPropertyReader #6

aditya0by0 · 2025-05-07T17:19:33Z

Tried running the normal GNN model with undirected graph and looks like for undirected graph the edge attributes needs to twice its number to able to be mapped to undirected edge (which is present as directed in both direction in edge_index).
Also need to check the mapping of right edge_attribute to right edge.

https://wandb.ai/chebai/chebai/runs/fu1tcxf4/logs

[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch_geometric/nn/conv/res_gated_graph_conv.py", line 128, in forward
[rank0]:     out = self.propagate(edge_index, k=k, q=q, v=v, edge_attr=edge_attr)
[rank0]:   File "/home/staff/a/akhedekar/atmp_dir/torch_geometric.nn.conv.res_gated_graph_conv_ResGatedGraphConv_propagate_x5frmxhf.py", line 231, in propagate
[rank0]:     out = self.message(
[rank0]:   File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch_geometric/nn/conv/res_gated_graph_conv.py", line 144, in message
[rank0]:     k_i = self.lin_key(torch.cat([k_i, edge_attr], dim=-1))
[rank0]: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2438 but got size 1219 for tensor number 1 in the list.

- instead of using adjacent directed edge, this one is better approach since we can stack edge attributes generated later without any further logic to rearrange edge_attr

aditya0by0 · 2025-05-15T10:18:29Z

Training has been started for this change : https://wandb.ai/chebai/chebai/runs/9xjpb6wi?nw=nwuseraditya0by0

Another job started with 2 gpus: https://wandb.ai/chebai/chebai/runs/ejg3ksex?nw=nwuseraditya0by0

aditya0by0 · 2025-05-15T19:58:49Z

@sfluegel05,

The training seems to be quite slow. I'm wondering if all of the following properties were actually used in the original training setup. Could you please share the corresponding Weights & Biases (wandb) link for the original run?

Encoding lengths are: 
[('AtomAromaticity', 1), 
 ('AtomCharge', 13), 
 ('AtomHybridization', 7), 
 ('AtomNumHs', 7), 
 ('AtomType', 119), 
 ('BondAromaticity', 1), 
 ('BondInRing', 1), 
 ('BondType', 5), 
 ('NumAtomBonds', 11), 
 ('RDKit2DNormalized', 200)]

sfluegel05 · 2025-05-19T14:38:04Z

Hi, I can confirm that the properties were used in actual runs, e.g. this one: https://wandb.ai/chebai/chebai/runs/cxmgl4eb (the technical setup is not the same one we use now, making it hard to compare, but I would expect it to get better with our current setup, not worse).

The bottleneck for this model is the creation of the dataset (especially RDKit2DNormalized). But one that is done, I would expect normal-ish behaviour during training.

instead of Base data module, as `load_processed_data_from_file` method used in this class is available in Dynamic dataset class

fix for #10

aditya0by0 · 2025-05-25T20:35:42Z

@sfluegel05, Please find the training below for this fix.
https://wandb.ai/chebai/chebai/runs/7h1icve9?nw=nwuseraditya0by0

Please review and merge.

sfluegel05 · 2025-05-30T07:37:53Z

I'm not sure what the run is telling me. Training seems to be doing fine, but the macro-f1 is a few percent lower compared to https://wandb.ai/chebai/chebai/runs/0oksfx9u

Does that mean that undirected is worse than directed? Or am I missing something here?

aditya0by0 · 2025-05-30T13:36:29Z

I noticed that you have mentioned a particular comment on this run. Is there something different you did with the token limit?

aditya0by0 · 2025-05-30T13:43:00Z

Also, for your run, have you changed the matmul precision?. For my runs, I haven't changed anything particular to precision. So the default precision for my run is "highest".

You are using a CUDA device ('NVIDIA H100 80GB HBM3') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1

sfluegel05 · 2025-05-30T16:14:49Z

I noticed that you have mentioned a particular comment on this run. Is there something different you did with the token limit?
The token_limit was only for the Electra model, not for the GNN. So no token limit was applied

Also, for your run, have you changed the matmul precision?. For my runs, I haven't changed anything particular to precision. So the default precision for my run is "highest".

I have been using the default precision for my run as well (32-true)

aditya0by0 · 2025-06-02T15:40:05Z

Some of the code repository of the corresponding research papers which used undirected graph for chemical bonds.
https://github.com/snap-stanford/pretrain-gnns/blob/8b20528a83b8869ce16451305b32c827258d19a3/chem/loader.py#L72-L86
https://github.com/snap-stanford/pretrain-gnns/blob/8b20528a83b8869ce16451305b32c827258d19a3/bio/loader.py#L56-L68
https://github.com/ZangXuan/HiMol/blob/main/data_utils.py#L179-L194

aditya0by0 · 2025-06-03T12:51:52Z

Directed: https://wandb.ai/chebai/chebai/runs/5yhpkxci/overview
Undirected: https://wandb.ai/chebai/chebai/runs/dlt1iug5/overview

Metric	Directed	Undirected
Train Loss (epoch)	0.00087	0.00140
Train Loss (step)	0.00077	0.00218
Train Macro-F1	0.9603	0.9274
Train Micro-F1	0.9903	0.9830
Global Step	62,799	62,799
Val Loss (epoch)	0.02067	0.01783
Val Loss (step)	0.01475	0.00697
Val Macro-F1	0.6810	0.6635
Val Micro-F1	0.9094	0.9067

aditya0by0 · 2025-07-26T21:41:56Z

I repeated the training on the same GPU type after making the training deterministic, and the results are consistent with the earlier observation:
➡️ Directed graphs outperform undirected graphs.

Undirected Graph (Deterministic runs):

Directed Graph (Deterministic runs):

aditya0by0 · 2025-07-29T11:06:58Z

I have few hypothesis for why directed graph perform better than undirected for an GNN end to end classification task.

Two undirected graphs with the same molecular structure ( but with different atom types and node features) are more likely to produce similar graph-level representations after 5 GNN convolution layers, compared to their directed counterparts.
This is because undirected graphs inherently preserve more structural symmetry, making distinct molecules appear isomorphic to the model, especially in the absence of rich node features. Consequently, undirected graphs increase the likelihood of representation collapse, where different molecules are mapped to similar embeddings.

Additionally, atoms with the same number and type of neighbors (e.g., carbons in aromatic rings) are more prone to receiving identical embeddings in undirected graphs due to symmetric message passing, particularly over deeper GNN stacks (e.g., 5 layers).
In contrast, directed graphs break this symmetry, allowing for more diverse and discriminative representations, even when directionality is assigned arbitrarily, meaning now the under directed graph the same atom will have different/less number of neighbors even though of same type. Hence, the amount of aggregation is reduced for same number of convolution layers.

Eg: Aspirin molecule (information flow from left to right due internal logic rdkit atom index numbers

Bonds:
	Bond index: 0, Atoms: (0, 1), Type: SINGLE
	Bond index: 1, Atoms: (1, 2), Type: DOUBLE
	Bond index: 2, Atoms: (1, 3), Type: SINGLE
	Bond index: 3, Atoms: (3, 4), Type: SINGLE
	Bond index: 4, Atoms: (4, 5), Type: AROMATIC
	Bond index: 5, Atoms: (5, 6), Type: AROMATIC
...
	Bond index: 9, Atoms: (9, 10), Type: SINGLE
	Bond index: 10, Atoms: (10, 11), Type: DOUBLE
	Bond index: 11, Atoms: (10, 12), Type: SINGLE
	Bond index: 12, Atoms: (9, 4), Type: AROMATIC

Create .gitignore

9150bb6

aditya0by0 self-assigned this May 7, 2025

aditya0by0 added 3 commits May 7, 2025 16:23

update precommit + github action

06a71a6

pre-commit format files

ff1adc9

change graph from directed to UNDIRECTED

d7f30d3

aditya0by0 requested a review from sfluegel05 May 7, 2025 15:06

aditya0by0 added 12 commits May 14, 2025 13:37

add test data

b8189d1

edge_features should be calculated after undirected graph

ad301e6

directed edge which form an un-dir edge should be adjancent

344d828

add test for GraphPropertyReader

8a69828

add gt test data for aspirin

e0064b8

Update test_data.py

a9c7228

add more graph test

0a9760d

first src to tgt edges then tgt to src

1a8dcb6

- instead of using adjacent directed edge, this one is better approach since we can stack edge attributes generated later without any further logic to rearrange edge_attr

add test for duplicate directed edges

5d4c174

restore import

945ef7c

concat edge attr for undirected graph

53a240a

concat prop values instead of edge_attr

b1f2da3

aditya0by0 added 4 commits May 21, 2025 11:35

Merge branch 'dev' into fix/directed-to-undirected-graph

3615fb1

Merge branch 'dev' into fix/directed-to-undirected-graph

7a8b664

inherit from ChebiOverX

53ca438

instead of Base data module, as `load_processed_data_from_file` method used in this class is available in Dynamic dataset class

nan_to_num numpy2.x compatibility

4319e47

fix for #10

This was linked to issues May 25, 2025

Clarification on Directed vs Undirected Graph Construction in _read_data of GraphPropertyReader #6

Open

⚠️ NumPy 2.0 Compatibility: np.nan_to_num(..., copy=False) raises ValueError #10

Closed

aditya0by0 added 2 commits May 25, 2025 18:59

add print statements

7c0a484

remove print for dataloader phase

ffc0b75

aditya0by0 added the bug Something isn't working label May 25, 2025

Update .gitignore

0a39749

aditya0by0 mentioned this pull request May 28, 2025

Remove Input Channel #11

Closed

aditya0by0 added bug:fix fix for bug and removed bug Something isn't working labels May 28, 2025

aditya0by0 added 2 commits July 24, 2025 13:14

merge from dev

8e56851

fix import error

f1e3eb6

aditya0by0 removed a link to an issue Jul 26, 2025

⚠️ NumPy 2.0 Compatibility: np.nan_to_num(..., copy=False) raises ValueError #10

Closed

sfluegel05 mentioned this pull request Jul 29, 2025

chebai graph model is undirected ChEB-AI/python-chebifier#10

Open

This was referenced Aug 1, 2025

Note on Reproducibility Across Different GPU Nodes ChEB-AI/python-chebai#111

Open

Clarify and Fix Usage of Input Channels in Convolution Layers #12

Open

sfluegel05 mentioned this pull request Aug 11, 2025

Fix - Api Models ChEB-AI/python-chebifier#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix to make Directed graph to Undirected #7

Fix to make Directed graph to Undirected #7

Uh oh!

aditya0by0 commented May 7, 2025

Uh oh!

aditya0by0 commented May 7, 2025

Uh oh!

aditya0by0 commented May 15, 2025

Uh oh!

aditya0by0 commented May 15, 2025

Uh oh!

sfluegel05 commented May 19, 2025

Uh oh!

aditya0by0 commented May 25, 2025

Uh oh!

sfluegel05 commented May 30, 2025

Uh oh!

aditya0by0 commented May 30, 2025

Uh oh!

aditya0by0 commented May 30, 2025 •

edited

Loading

Uh oh!

sfluegel05 commented May 30, 2025

Uh oh!

aditya0by0 commented Jun 2, 2025

Uh oh!

aditya0by0 commented Jun 3, 2025

Uh oh!

aditya0by0 commented Jul 26, 2025

Uh oh!

aditya0by0 commented Jul 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fix to make Directed graph to Undirected #7

Are you sure you want to change the base?

Fix to make Directed graph to Undirected #7

Uh oh!

Conversation

aditya0by0 commented May 7, 2025

Uh oh!

aditya0by0 commented May 7, 2025

Uh oh!

aditya0by0 commented May 15, 2025

Uh oh!

aditya0by0 commented May 15, 2025

Uh oh!

sfluegel05 commented May 19, 2025

Uh oh!

aditya0by0 commented May 25, 2025

Uh oh!

sfluegel05 commented May 30, 2025

Uh oh!

aditya0by0 commented May 30, 2025

Uh oh!

aditya0by0 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfluegel05 commented May 30, 2025

Uh oh!

aditya0by0 commented Jun 2, 2025

Uh oh!

aditya0by0 commented Jun 3, 2025

Uh oh!

aditya0by0 commented Jul 26, 2025

Uh oh!

aditya0by0 commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aditya0by0 commented May 30, 2025 •

edited

Loading

aditya0by0 commented Jul 29, 2025 •

edited

Loading