Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,59 @@ print(refolded_embedding.shape)
# torch.Size([2, 5, 16]) # 2 samples, 5 words max, 16 dims
```

### Pooling spans

You can pool variable length spans directly on a refolded view without padding by building flat indices and offsets and then using `embedding_bag`.

The helper `lengths.make_indices_ranges` expands ranges defined over one or more variable dimensions.

- `indices` are the flat positions in the refolded tensor viewed as a single dimension
- `offsets` are the start positions of each span within `indices`
- `spans` gives the span id for every expanded position, which can be useful for functions like `torch.index_add` or `torch.index_reduce`

Example that sums over word spans to produce one vector per span

```python
import torch
import foldedtensor as ft

# Build a 4 level tensor with names: first word of the first context is split into three tokens, etc
input_ids = ft.as_folded_tensor(
[
[
[[0, 2, 3], [10], [4]],
[[0, 1, 2], [2, 3], [10, 11], [100, 101]],
],
],
full_names=("sample", "context", "word", "token"),
).refold(
"token"
) # any refolding is fine

# Create embeddings from the input ids
embedding = torch.nn.Embedding(2048, 16)
weight = embedding(input_ids)

# Pool two word spans per the test
# span 1 covers words 0 to 2 -> mean pool over 4 tokens [0, 2, 3, 10]
# span 2 covers words 5 to 7 -> mean pool over 4 tokens [10, 11, 100, 101]
indices, offsets, spans = input_ids.lengths.make_indices_ranges(
begins=(torch.tensor([0, 5]),),
ends=(torch.tensor([2, 7]),),
indice_dims=("word",),
)

# Sum embeddings over each span
pooled = torch.nn.functional.embedding_bag(
input=indices,
# Flatten embeddings so rows align with flattened token positions
weight=weight.view(-1, weight.size(-1)),
offsets=offsets,
mode="mean",
)
print(pooled)
```

## Benchmarks

View the comparisons of `foldedtensor` against various alternatives here: [docs/benchmarks](https://github.com/aphp/foldedtensor/blob/main/docs/benchmark.md).
Expand Down
7 changes: 7 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Changelog

## Unreleased

- Add `map_indices` and `make_indices_ranges` with C++ backends and expose `lengths.map_indices` and `lengths.make_indices_ranges` with boundary handling and flat indices with offsets and span ids for pooling with `embedding_bag`.
- Introduce `FoldedTensorLayout` to store `full_names` and `data_dims` with named dimension resolution and helper methods and use it as the `lengths` container for `FoldedTensor`
- Improve `as_folded_tensor` to better infer dims and dtype from nested data and to accept named `data_dims` and better handle names and empty structures
- Benchmark script adds `--cases` to run selected cases and a new case for range based pooling and adjusts outputs

## v0.4.0

- Fix `storage` torch warning
Expand Down
97 changes: 64 additions & 33 deletions docs/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ It compares the performance of `foldedtensor` with various alternatives for padd
and working with nested lists and tensors.

Environment:
- `torch.__version__ == '2.6.0'`
- `torch.__version__ == '2.8.0'`
- `foldedtensor.__version__ == '0.4.0'`
- `python == 3.9.20`
- `python == 3.11.3`
- `sys.platform == 'darwin'`


Expand All @@ -22,79 +22,79 @@ nested_list = make_nested_list(32, (50, 100), (25, 30), value=1)

Comparisons:
%timeit python_padding(nested_list)
# 100 loops, best of 5: 15.09 ms per loop
# 100 loops, best of 5: 19.02 ms per loop

%timeit foldedtensor.as_folded_tensor(nested_list)
# 100 loops, best of 5: 0.73 ms per loop
# 100 loops, best of 5: 0.82 ms per loop

```
Speedup against best alternative: **20.67x** :rocket:
Speedup against best alternative: **23.24x** :rocket:

## Case 2 (same lengths nested lists)

```python
nested_list = make_nested_list(32, 100, 30, value=1)

%timeit torch.tensor(nested_list)
# 100 loops, best of 5: 6.51 ms per loop
# 100 loops, best of 5: 7.86 ms per loop

%timeit torch.LongTensor(nested_list)
# 100 loops, best of 5: 2.78 ms per loop
# 100 loops, best of 5: 3.69 ms per loop

%timeit python_padding(nested_list)
# 100 loops, best of 5: 18.38 ms per loop
# 100 loops, best of 5: 23.35 ms per loop

%timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0)
# 100 loops, best of 5: 3.00 ms per loop
# 100 loops, best of 5: 3.94 ms per loop

%timeit foldedtensor.as_folded_tensor(nested_list)
# 100 loops, best of 5: 1.08 ms per loop
# 100 loops, best of 5: 1.18 ms per loop

```
Speedup against best alternative: **2.58x** :rocket:
Speedup against best alternative: **3.12x** :rocket:

## Case 3 (simple list)

```python
simple_list = make_nested_list(10000, value=1)

%timeit torch.tensor(simple_list)
# 100 loops, best of 5: 0.63 ms per loop
# 100 loops, best of 5: 0.77 ms per loop

%timeit torch.LongTensor(simple_list)
# 100 loops, best of 5: 0.27 ms per loop
# 100 loops, best of 5: 0.37 ms per loop

%timeit python_padding(simple_list)
# 100 loops, best of 5: 0.28 ms per loop
# 100 loops, best of 5: 0.37 ms per loop

%timeit foldedtensor.as_folded_tensor(simple_list)
# 100 loops, best of 5: 0.08 ms per loop
# 100 loops, best of 5: 0.10 ms per loop

```
Speedup against best alternative: **3.32x** :rocket:
Speedup against best alternative: **3.59x** :rocket:

## Case 4 (same lengths nested lists to flat tensor)

```python
nested_list = make_nested_list(32, 100, 30, value=1)

%timeit torch.tensor(nested_list).view(-1)
# 100 loops, best of 5: 6.52 ms per loop
# 100 loops, best of 5: 7.83 ms per loop

%timeit torch.LongTensor(nested_list).view(-1)
# 100 loops, best of 5: 2.76 ms per loop
# 100 loops, best of 5: 3.68 ms per loop

%timeit python_padding(nested_list).view(-1)
# 100 loops, best of 5: 18.62 ms per loop
# 100 loops, best of 5: 23.17 ms per loop

%timeit foldedtensor.as_folded_tensor(nested_list).view(-1)
# 100 loops, best of 5: 1.12 ms per loop
# 100 loops, best of 5: 1.19 ms per loop

%timeit foldedtensor.as_folded_tensor(nested_list, data_dims=(2,))
# 100 loops, best of 5: 1.08 ms per loop
# 100 loops, best of 5: 1.16 ms per loop

```
Speedup against best alternative: **2.47x** :rocket:
Speedup against best alternative: **3.10x** :rocket:
## Case 5 (variable lengths nested lists) to padded embeddings

Nested lists with different lengths (second level lists have lengths between 50 and 150). We compare `foldedtensor` with `torch.nested`.
Expand All @@ -104,41 +104,72 @@ nested_list = make_nested_list(32, (50, 150), 30, value=1)
# Padding with 0

%timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0)
# 100 loops, best of 5: 3.02 ms per loop
# 100 loops, best of 5: 4.40 ms per loop

%timeit foldedtensor.as_folded_tensor(nested_list).as_tensor()
# 100 loops, best of 5: 1.03 ms per loop
# 100 loops, best of 5: 1.29 ms per loop

```
Speedup against best alternative: **2.95x** :rocket:
Speedup against best alternative: **3.41x** :rocket:
```python
# Padding with 1

%timeit torch.nested.nested_tensor([torch.FloatTensor(sub) for sub in nested_list]).to_padded_tensor(1)
# 100 loops, best of 5: 3.72 ms per loop
# 100 loops, best of 5: 4.77 ms per loop

%timeit x = foldedtensor.as_folded_tensor(nested_list); x.masked_fill_(x.mask, 1)
# 100 loops, best of 5: 1.62 ms per loop
# 100 loops, best of 5: 1.65 ms per loop

```
Speedup against best alternative: **2.30x** :rocket:
Speedup against best alternative: **2.89x** :rocket:

## Case 6 (2d padding)

```python
nested_list = make_nested_list(160, (50, 150), value=1)

%timeit python_padding(nested_list)
# 100 loops, best of 5: 1.33 ms per loop
# 100 loops, best of 5: 1.73 ms per loop

%timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0)
# 100 loops, best of 5: 1.14 ms per loop
# 100 loops, best of 5: 1.48 ms per loop

%timeit torch.nn.utils.rnn.pad_sequence([torch.LongTensor(sub) for sub in nested_list], batch_first=True, padding_value=0)
# 100 loops, best of 5: 0.86 ms per loop
# 100 loops, best of 5: 1.22 ms per loop

%timeit foldedtensor.as_folded_tensor(nested_list)
# 100 loops, best of 5: 0.15 ms per loop
# 100 loops, best of 5: 0.18 ms per loop

```
Speedup against best alternative: **5.88x** :rocket:
Speedup against best alternative: **6.68x** :rocket:

## Case 7 (summing vectors inside each differently-sized sequence, all concatenated)

```python
def sum_all_words_per_sample(t):
begins = torch.arange(len(t.lengths[1]))
ends = begins + 1
indices, offsets, spans = t.lengths.make_indices_ranges(
begins=(begins,), ends=(ends,), indice_dims=(0,)
)
return torch.nn.functional.embedding_bag(
input=indices,
weight=t.view(-1, t.size(-1)),
offsets=offsets,
mode="sum",
)

embedder = torch.nn.Embedding(500, 128)
nested_list = make_nested_list(320, (150, 250), value=1)
ft = foldedtensor.as_folded_tensor(nested_list).refold(1)
ft = embedder(ft)


%timeit ft.refold(0, 1).sum(-2)
# 100 loops, best of 5: 3.54 ms per loop

%timeit sum_all_words_per_sample(ft)
# 100 loops, best of 5: 1.01 ms per loop

```
Speedup against pad-then-sum: **3.52x** :rocket:
Loading
Loading