@@ -8,9 +8,9 @@ It compares the performance of `foldedtensor` with various alternatives for padd
8
8
and working with nested lists and tensors.
9
9
10
10
Environment:
11
- - ` torch.__version__ == '2.6.0 ' `
11
+ - ` torch.__version__ == '2.7.1 ' `
12
12
- ` foldedtensor.__version__ == '0.4.0' `
13
- - ` python == 3.9.20 `
13
+ - ` python == 3.11.3 `
14
14
- ` sys.platform == 'darwin' `
15
15
16
16
@@ -22,79 +22,79 @@ nested_list = make_nested_list(32, (50, 100), (25, 30), value=1)
22
22
23
23
Comparisons:
24
24
% timeit python_padding(nested_list)
25
- # 100 loops, best of 5: 15.09 ms per loop
25
+ # 100 loops, best of 5: 16.96 ms per loop
26
26
27
27
% timeit foldedtensor.as_folded_tensor(nested_list)
28
- # 100 loops, best of 5: 0.73 ms per loop
28
+ # 100 loops, best of 5: 0.88 ms per loop
29
29
30
30
```
31
- Speedup against best alternative: ** 20.67x ** :rocket :
31
+ Speedup against best alternative: ** 19.36x ** :rocket :
32
32
33
33
## Case 2 (same lengths nested lists)
34
34
35
35
``` python
36
36
nested_list = make_nested_list(32 , 100 , 30 , value = 1 )
37
37
38
38
% timeit torch.tensor(nested_list)
39
- # 100 loops, best of 5: 6.51 ms per loop
39
+ # 100 loops, best of 5: 7.67 ms per loop
40
40
41
41
% timeit torch.LongTensor(nested_list)
42
- # 100 loops, best of 5: 2.78 ms per loop
42
+ # 100 loops, best of 5: 3.59 ms per loop
43
43
44
44
% timeit python_padding(nested_list)
45
- # 100 loops, best of 5: 18.38 ms per loop
45
+ # 100 loops, best of 5: 20.02 ms per loop
46
46
47
47
% timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0 )
48
- # 100 loops, best of 5: 3.00 ms per loop
48
+ # 100 loops, best of 5: 3.83 ms per loop
49
49
50
50
% timeit foldedtensor.as_folded_tensor(nested_list)
51
- # 100 loops, best of 5: 1.08 ms per loop
51
+ # 100 loops, best of 5: 1.22 ms per loop
52
52
53
53
```
54
- Speedup against best alternative: ** 2.58x ** :rocket :
54
+ Speedup against best alternative: ** 2.94x ** :rocket :
55
55
56
56
## Case 3 (simple list)
57
57
58
58
``` python
59
59
simple_list = make_nested_list(10000 , value = 1 )
60
60
61
61
% timeit torch.tensor(simple_list)
62
- # 100 loops, best of 5: 0.63 ms per loop
62
+ # 100 loops, best of 5: 0.75 ms per loop
63
63
64
64
% timeit torch.LongTensor(simple_list)
65
- # 100 loops, best of 5: 0.27 ms per loop
65
+ # 100 loops, best of 5: 0.36 ms per loop
66
66
67
67
% timeit python_padding(simple_list)
68
- # 100 loops, best of 5: 0.28 ms per loop
68
+ # 100 loops, best of 5: 0.36 ms per loop
69
69
70
70
% timeit foldedtensor.as_folded_tensor(simple_list)
71
- # 100 loops, best of 5: 0.08 ms per loop
71
+ # 100 loops, best of 5: 0.10 ms per loop
72
72
73
73
```
74
- Speedup against best alternative: ** 3.32x ** :rocket :
74
+ Speedup against best alternative: ** 3.57x ** :rocket :
75
75
76
76
## Case 4 (same lengths nested lists to flat tensor)
77
77
78
78
``` python
79
79
nested_list = make_nested_list(32 , 100 , 30 , value = 1 )
80
80
81
81
% timeit torch.tensor(nested_list).view(- 1 )
82
- # 100 loops, best of 5: 6.52 ms per loop
82
+ # 100 loops, best of 5: 7.65 ms per loop
83
83
84
84
% timeit torch.LongTensor(nested_list).view(- 1 )
85
- # 100 loops, best of 5: 2.76 ms per loop
85
+ # 100 loops, best of 5: 3.63 ms per loop
86
86
87
87
% timeit python_padding(nested_list).view(- 1 )
88
- # 100 loops, best of 5: 18.62 ms per loop
88
+ # 100 loops, best of 5: 20.36 ms per loop
89
89
90
90
% timeit foldedtensor.as_folded_tensor(nested_list).view(- 1 )
91
- # 100 loops, best of 5: 1.12 ms per loop
91
+ # 100 loops, best of 5: 1.22 ms per loop
92
92
93
93
% timeit foldedtensor.as_folded_tensor(nested_list, data_dims = (2 ,))
94
- # 100 loops, best of 5: 1.08 ms per loop
94
+ # 100 loops, best of 5: 1.20 ms per loop
95
95
96
96
```
97
- Speedup against best alternative: ** 2.47x ** :rocket :
97
+ Speedup against best alternative: ** 2.96x ** :rocket :
98
98
## Case 5 (variable lengths nested lists) to padded embeddings
99
99
100
100
Nested lists with different lengths (second level lists have lengths between 50 and 150). We compare ` foldedtensor ` with ` torch.nested ` .
@@ -104,41 +104,72 @@ nested_list = make_nested_list(32, (50, 150), 30, value=1)
104
104
# Padding with 0
105
105
106
106
% timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0 )
107
- # 100 loops, best of 5: 3.02 ms per loop
107
+ # 100 loops, best of 5: 4.10 ms per loop
108
108
109
109
% timeit foldedtensor.as_folded_tensor(nested_list).as_tensor()
110
- # 100 loops, best of 5: 1.03 ms per loop
110
+ # 100 loops, best of 5: 1.23 ms per loop
111
111
112
112
```
113
- Speedup against best alternative: ** 2.95x ** :rocket :
113
+ Speedup against best alternative: ** 3.33x ** :rocket :
114
114
``` python
115
115
# Padding with 1
116
116
117
117
% timeit torch.nested.nested_tensor([torch.FloatTensor(sub) for sub in nested_list]).to_padded_tensor(1 )
118
- # 100 loops, best of 5: 3.72 ms per loop
118
+ # 100 loops, best of 5: 4.42 ms per loop
119
119
120
120
% timeit x = foldedtensor.as_folded_tensor(nested_list); x.masked_fill_(x.mask, 1 )
121
- # 100 loops, best of 5: 1.62 ms per loop
121
+ # 100 loops, best of 5: 1.58 ms per loop
122
122
123
123
```
124
- Speedup against best alternative: ** 2.30x ** :rocket :
124
+ Speedup against best alternative: ** 2.80x ** :rocket :
125
125
126
126
## Case 6 (2d padding)
127
127
128
128
``` python
129
129
nested_list = make_nested_list(160 , (50 , 150 ), value = 1 )
130
130
131
131
% timeit python_padding(nested_list)
132
- # 100 loops, best of 5: 1.33 ms per loop
132
+ # 100 loops, best of 5: 1.48 ms per loop
133
133
134
134
% timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0 )
135
- # 100 loops, best of 5: 1.14 ms per loop
135
+ # 100 loops, best of 5: 1.28 ms per loop
136
136
137
137
% timeit torch.nn.utils.rnn.pad_sequence([torch.LongTensor(sub) for sub in nested_list], batch_first = True , padding_value = 0 )
138
- # 100 loops, best of 5: 0.86 ms per loop
138
+ # 100 loops, best of 5: 1.02 ms per loop
139
139
140
140
% timeit foldedtensor.as_folded_tensor(nested_list)
141
- # 100 loops, best of 5: 0.15 ms per loop
141
+ # 100 loops, best of 5: 0.17 ms per loop
142
142
143
143
```
144
- Speedup against best alternative: ** 5.88x** :rocket :
144
+ Speedup against best alternative: ** 6.03x** :rocket :
145
+
146
+ ## Case 7 (summing vectors inside each differently-sized sequence, all concatenated)
147
+
148
+ ``` python
149
+ def sum_all_words_per_sample (t ):
150
+ begins = torch.arange(len (t.lengths[1 ]))
151
+ ends = begins + 1
152
+ indices, offsets, spans = t.lengths.make_indices_ranges(
153
+ begins = (begins,), ends = (ends,), indice_dims = (0 ,)
154
+ )
155
+ return torch.nn.functional.embedding_bag(
156
+ input = indices,
157
+ weight = t.view(- 1 , t.size(- 1 )),
158
+ offsets = offsets,
159
+ mode = " sum" ,
160
+ )
161
+
162
+ embedder = torch.nn.Embedding(500 , 128 )
163
+ nested_list = make_nested_list(320 , (150 , 250 ), value = 1 )
164
+ ft = foldedtensor.as_folded_tensor(nested_list).refold(1 )
165
+ ft = embedder(ft)
166
+
167
+
168
+ % timeit ft.refold(0 , 1 ).sum(- 2 )
169
+ # 100 loops, best of 5: 3.56 ms per loop
170
+
171
+ % timeit sum_all_words_per_sample(ft)
172
+ # 100 loops, best of 5: 1.00 ms per loop
173
+
174
+ ```
175
+ Speedup against pad-then-sum: ** 3.56x** :rocket :
0 commit comments