You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+60-37Lines changed: 60 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,10 +6,10 @@ Most of examples from [SDSL cheat sheet][SDSL-CHEAT-SHEET] and [SDSL tutorial][S
6
6
7
7
## Mutable bit-compressed vectors
8
8
9
-
Core classes:
9
+
Core classes (see `pysdsl.int_vector` for dict of all of them):
10
10
11
11
*`pysdsl.IntVector(size, default_value, bit_width=64)` — dynamic bit width
12
-
*`pysdsl.BitVector(size, default_value)` — static bit width (1)
12
+
*`pysdsl.BitVector(size, default_value)` — static (fixed) bit width (1)
13
13
*`pysdsl.Int4Vector(size, default_value)` — static bit width (4)
14
14
*`pysdsl.Int8Vector(size, default_value)` — static bit width (8)
15
15
*`pysdsl.Int16Vector(size, default_value)` — static bit width (16)
@@ -49,8 +49,21 @@ Out[8]: 896.0000085830688
49
49
50
50
```
51
51
52
+
Buffer interface:
53
+
54
+
```python
55
+
In [9]: import array
56
+
57
+
In [10]: v = pysdsl.Int64Vector([1, 2, 3])
58
+
59
+
In [11]: array.array('Q', v)
60
+
Out[11]: array('Q', [1, 2, 3])
61
+
```
62
+
52
63
## Immutable compressed integer vectors
53
64
65
+
(See `pysdsl.enc_vector`):
66
+
54
67
*`EncVectorEliasDelta(IntVector)`
55
68
*`EncVectorEliasGamma(IntVector)`
56
69
*`EncVectorFibonacci(IntVector)`
@@ -66,41 +79,51 @@ In [10]: ev.size_in_mega_bytes
66
79
Out[10]: 45.75003242492676
67
80
```
68
81
69
-
Encoding values with variable length codes:
82
+
Encoding values with variable length codes (see `pysdsl.variable_length_codes_vector`):
70
83
71
-
*`VlcVectorEliasDelta(IntVector)`
72
-
*`VlcVectorEliasGamma(IntVector)`
73
-
*`VlcVectorFibonacci(IntVector)`
74
-
*`VlcVectorComma2(IntVector)`
75
-
*`VlcVectorComma4(IntVector)`
84
+
*`VariableLengthCodesVectorEliasDelta(IntVector)`
85
+
*`VariableLengthCodesVectorEliasGamma(IntVector)`
86
+
*`VariableLengthCodesVectorFibonacci(IntVector)`
87
+
*`VariableLengthCodesVectorComma2(IntVector)`
88
+
*`VariableLengthCodesVectorComma4(IntVector)`
76
89
77
-
Encoding values with "escaping" technique:
90
+
Encoding values with "escaping" technique (see `pysdsl.direct_accessible_codes_vector`):
78
91
79
-
*`DacVector(IntVector)`
80
-
*`DacVectorDP(IntVector)` — number of layers is chosen
81
-
with dynamic programming
92
+
*`DirectAccessibleCodesVector(IntVector)`
93
+
*`DirectAccessibleCodesVector8(IntVector)`,
94
+
*`DirectAccessibleCodesVector16(IntVector)`,
95
+
*`DirectAccessibleCodesVector63(IntVector)`,
96
+
*`DirectAccessibleCodesVectorDP(IntVector)` — number of layers is chosen
97
+
with dynamic programming
98
+
*`DirectAccessibleCodesVectorDPRRR(IntVector)` — same but built on top of
99
+
RamanRamanRaoVector (see later)
82
100
83
101
Construction from python sequences is also supported.
84
102
85
103
## Immutable compressed bit (boolean) vectors
86
104
87
-
*`BitVectorIL64(BitVector)`
88
-
*`BitVectorIL128(BitVector)`
89
-
*`BitVectorIL256(BitVector)`
90
-
*`BitVectorIL512(BitVector)` — A bit vector which interleaves the
91
-
original `BitVector` with rank information.
105
+
(See pysdsl.`all_immutable_bitvectors`)
106
+
107
+
*`BitVectorInterLeaved64(BitVector)`
108
+
*`BitVectorInterLeaved128(BitVector)`
109
+
*`BitVectorInterLeaved256(BitVector)`
110
+
*`BitVectorInterLeaved512(BitVector)` — A bit vector which interleaves the
111
+
original `BitVector` with rank information
112
+
(see later)
92
113
*`SDVector(BitVector)` — A bit vector which compresses very sparse populated
93
114
bit vectors by representing the positions of 1 by the
94
115
Elias-Fano representation for
95
116
non-decreasing sequences
96
-
*`RRRVector3(BitVector)`
97
-
*`RRRVector15(BitVector)`
98
-
*`RRRVector63(BitVector)`
99
-
*`RRRVector256(BitVector)` — An H₀-compressed bitvector representation.
117
+
*`RamanRamanRaoVector15(BitVector)`
118
+
*`RamanRamanRaoVector63(BitVector)`
119
+
*`RamanRamanRaoVector256(BitVector)` — An H₀-compressed bitvector representation.
100
120
*`HybVector8(BitVector)`
101
121
*`HybVector16(BitVector)` — A hybrid-encoded compressed bitvector
102
122
representation
103
123
124
+
See also: `pysdsl.raman_raman_rao_vectors`, `pysdsl.sparse_bit_vectors`,
125
+
`pysdsl.hybrid_bit_vectors` and `pysdsl.bit_vector_interleaved`.
126
+
104
127
## Rank and select operations on bitvectors
105
128
106
129
For bitvector `v``rank(i)` for pattern `P` (by default `P` is a bitstring of
@@ -134,6 +157,22 @@ the results.
134
157
mutable and was modified.
135
158
136
159
160
+
## Wavelet trees
161
+
162
+
The wavelet tree is a data structure that provides three efficient methods:
163
+
164
+
* The `[]`-operator: `wt[i]` returns the `i`-th symbol of vector for which the wavelet tree was build for.
165
+
* The rank method: `wt.rank(i, c)` returns the number of occurrences of symbol `c` in the prefix `[0..i-1]` in the vector for which the wavelet tree was build for.
166
+
* The select method: `wt.select(j, c)` returns the index `i` from `[0..size()-1]` of the `j`-th occurrence of symbol `c`.
167
+
168
+
## Comressed suffix arrays
169
+
170
+
Suffix array is a sorted array of all suffixes of a string.
171
+
172
+
SDSL supports bitcompressed and compressed suffix arrays.
173
+
174
+
Byte representaion of original IntVector should have no zero symbols in order to construct SuffixArray.
175
+
137
176
## Objects memory structure
138
177
139
178
Any object has a `.structure` property with technical information about an
@@ -151,22 +190,6 @@ object into a file.
151
190
All classes provide `.load_from_checkded_file()` static method allowing one to
152
191
load object stored with `.store_to_checked_file()`
153
192
154
-
## Wavelet trees
155
-
156
-
The wavelet tree is a data structure that provides three efficient methods:
157
-
158
-
* The `[]`-operator: `wt[i]` returns the `i`-th symbol of vector for which the wavelet tree was build for.
159
-
* The rank method: `wt.rank(i, c)` returns the number of occurrences of symbol `c` in the prefix `[0..i-1]` in the vector for which the wavelet tree was build for.
160
-
* The select method: `wt.select(j, c)` returns the index `i` from `[0..size()-1]` of the `j`-th occurrence of symbol `c`.
161
-
162
-
## Comressed suffix arrays
163
-
164
-
Suffix array is a sorted array of all suffixes of a string.
165
-
166
-
SDSL supports bitcompressed and compressed suffix arrays.
167
-
168
-
Byte representaion of original IntVector should have no zero symbols in order to construct SuffixArray.
0 commit comments