✨ Zero-Copy Tensors via `DLPack`

### What's the problem this feature will solve?

As discussed, loading massive AIGs into PyTorch Geometric (PyG) for deep learning requires an extremely efficient data pipeline. Copying flat arrays from C++ to Python and then again to PyTorch creates unnecessary memory overhead and heavily bottlenecks training loops when processing large datasets of circuits.

### Describe the solution you'd like

Being able to use `torch.from_dlpack()` allows for zero-copy data transfer directly into PyTorch—keeping the operation $O(1)$ regardless of graph size.

Following up on the machine learning integration thoughts in https://github.com/marcelwa/aigverse/issues/283, here is a breakdown of the specific DLPack capsules I'd need shaped and returned from the C++ side:

### 1. Graph Topology

Just need the non-zero entries for the edges:

* **`edge_index`**: Tensor of shape `(2, E)` representing the source and target nodes. *(Note: Must be `int64_t` so PyG doesn't force a memory copy to `torch.long`)*.

* **`edge_attr`**: Tensor of shape `(E, D_edge)`. *(Note: Allocate as `float32`)*.

  * Needs a flag to choose the type of encoding for the edge inversion (regular vs. inverted). Specifically:

    * **Integer `(0, 1)`**: Useful if passing into a categorical embedding layer in PyG.

    * **Integer `(1, -1)`**: Crucial if instantiating sparse matrix multiplication (using `0` would completely zero-out that edge type).

    * **One-hot `(E x 2)`**.

  * It would also be super helpful to explicitly document the mapping for these encodings (e.g., `0 = regular`, `1 = inverted`, or which index corresponds to which type in the one-hot vector) so there's no ambiguity on the Python side.

### 2. Node Attributes

* **`node_attr`**: Tensor of shape `(N, D_node)`. *(Note: Allocate as `float32`)*.

  * Needs a flag to choose the base encoding for node types (constant, PI, gate, PO): integer (base dim = 1) or one-hot encoded (base dim = `num_types` = 4).

  * **Optional Add-ons**: Boolean flags to concatenate additional features. Ideally, these should be stacked/concatenated along the feature dimension (axis 1) into a single contiguous block of memory on the C++ side. **`D_node`** will be the base dimension plus any included features:

    * `include_level=False`: Appends the logic level (+1 dimension).

    * `include_fanout=False`: Appends the number of fanouts (+1 dimension).

    * `include_truth_table=False`: Appends the local truth table per node (+`TT_dim` dimensions).

  * *(Example: If using one-hot, level, and fanout, they are stacked horizontally so `D_node` = 4 + 1 + 1 = 6).*

### Passing the Data via Dictionary

To pass these flat arrays back to Python, returning a dictionary of DLPack capsules seems like the gold standard for efficiency. The dictionary overhead is basically zero while keeping the Python API extremely clean and native to tools like PyTorch Geometric.

### Suggested Python Interaction

If the C++ binding (e.g., `to_graph_tensor`) returns a dictionary of these DLPack capsules, I can instantly build either a PyTorch Geometric graph or a native PyTorch sparse matrix on my end without any C++ overhead:

```python
import torch

# Call the C++ binding to get a dict of DLPack capsules
dlpack_data = aig.to_graph_tensor(
    node_encoding="one_hot", 
    edge_encoding="int_1_minus_1", # Specifying the safe integer format
    include_level=False,
    include_fanout=False
)

# Zero-copy conversion to PyTorch tensors!
edge_index = torch.from_dlpack(dlpack_data["edge_index"])   # Shape: (2, E)
edge_attr = torch.from_dlpack(dlpack_data["edge_attr"])     # Shape: (E, D_edge)
node_attr = torch.from_dlpack(dlpack_data["node_attr"])     # Shape: (N, D_node)

# If I need a mathematical Sparse Adjacency Matrix, I construct it instantly:
num_nodes = node_attr.shape[0]
sparse_adj = torch.sparse_coo_tensor(
    indices=edge_index, 
    values=edge_attr, 
    size=(num_nodes, num_nodes, edge_attr.shape[1])
)

```

As a quick thought, since building the sparse adjacency matrix this way is super handy but maybe not obvious to everyone, it might be worth adding a quick note or docstring to `to_graph_tensor` explaining this `torch.sparse_coo_tensor` trick. That way, anyone else who needs an adjacency matrix knows exactly how to build it efficiently from the sparse outputs!

### Example: PyTorch Geometric Dataset Integration

To show exactly why this format is so powerful, here is a quick mock-up of how this function lets us process a massive folder of AIGER files directly into PyTorch Geometric `Data` objects on the fly, keeping our RAM overhead tiny:

```python
from pathlib import Path
import torch
from torch_geometric.data import Dataset, Data
import aigverse

class AIGERDataset(Dataset):
    def __init__(self, root_dir):
        super().__init__(root_dir)
        # Find all .aig files in the target directory using pathlib
        self.file_paths = list(Path(root_dir).glob('*.aig'))

    def len(self):
        return len(self.file_paths)

    def get(self, idx):
        file_path = str(self.file_paths[idx])
        
        # 1. Parse AIG via aigverse
        aig = aigverse.read_aiger(file_path)
        
        # 2. Extract DLPack capsules using the proposed C++ function
        dlpack_data = aig.to_graph_tensor(
            node_encoding="one_hot", 
            edge_encoding="int_0_1", # Using 0/1 here since PyG Embeddings like standard class indices
            include_level=True,
            include_fanout=True
        )
        
        # 3. Zero-copy ingest into PyTorch
        x = torch.from_dlpack(dlpack_data["node_attr"])
        edge_index = torch.from_dlpack(dlpack_data["edge_index"])
        edge_attr = torch.from_dlpack(dlpack_data["edge_attr"])
        
        # 4. Return standard PyTorch Geometric Data object
        return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

# Usage: 
# dataset = AIGERDataset(root_dir="/path/to/massive/aig/folder")
# loader = DataLoader(dataset, batch_size=32, shuffle=True)
```

Would love to get your input on this! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Zero-Copy Tensors via `DLPack` #307

What's the problem this feature will solve?

Describe the solution you'd like

1. Graph Topology

2. Node Attributes

Passing the Data via Dictionary

Suggested Python Interaction

Example: PyTorch Geometric Dataset Integration

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

✨ Zero-Copy Tensors via DLPack #307

Description

What's the problem this feature will solve?

Describe the solution you'd like

1. Graph Topology

2. Node Attributes

Passing the Data via Dictionary

Suggested Python Interaction

Example: PyTorch Geometric Dataset Integration

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

✨ Zero-Copy Tensors via `DLPack` #307