|
1 | 1 | // Copyright Kani Contributors |
2 | 2 | // SPDX-License-Identifier: Apache-2.0 OR MIT |
3 | 3 | //! GOTO binary serializer. |
| 4 | +//! |
| 5 | +//! # Design overview |
| 6 | +//! |
| 7 | +//! When saving a [SymbolTable] to a binary file, the [Irep] describing each |
| 8 | +//! symbol's type, value and source location are structurally hashed and |
| 9 | +//! uniquely numbered so that structurally identical [Irep] only get written |
| 10 | +//! in full to the file the first time they are encountered and that ulterior |
| 11 | +//! occurrences are referenced by their unique number instead. |
| 12 | +//! The key concept at play is that of a numbering, ie a function that assigns |
| 13 | +//! numbers to values of a given type. |
| 14 | +//! |
| 15 | +//! The [IrepNumbering] struct offers high-level methods to number |
| 16 | +//! [InternedString], [IrepId] and [Irep] values: |
| 17 | +//! - [InternedString] objects get mapped to [NumberedString] objects based on |
| 18 | +//! the characters they contain. |
| 19 | +//! - [IrepId] objects get mapped to [NumberedString] objects based on the |
| 20 | +//! characters of their string representation, in the same number space |
| 21 | +//! as [InternedString]. |
| 22 | +//! - [Irep] objects get mapped to [NumberedIrep] based on: |
| 23 | +//! + the unique numbers assigned to their [Irep::id] attribute, |
| 24 | +//! + the unique numbers of [Irep] found in their [Irep::sub] attribute, |
| 25 | +//! + the pairs of unique numbers assigned to the ([IrepId],[Irep]) pairs |
| 26 | +//! found in their [Ipre::named_sub] attribute. |
| 27 | +//! |
| 28 | +//! In order to assign the same number to structurally identical [Irep] objects, |
| 29 | +//! [IrepNumbering] essentially implements a cache where each [NumberedIrep] is |
| 30 | +//! keyed under an [IrepKey] that describes its internal structure. |
| 31 | +//! |
| 32 | +//! An [IrepKey] is simply the vector of unique numbers describing the |
| 33 | +//! `id`, `sub` and `named_sub` attributes of a [Irep]. |
| 34 | +//! |
| 35 | +//! A [NumberedIrep] is conceptually a pair made of the [IrepKey] itself and the |
| 36 | +//! unique number assigned to that key. |
| 37 | +//! |
| 38 | +//! The cache implemented by [IrepNumbering] is bidirectional. It lets you |
| 39 | +//! compute the [NumberedIrep] of an [Irep], and lets you fetch a numbered |
| 40 | +//! [NumberedIrep] from its unique number. |
| 41 | +//! |
| 42 | +//! In practice: |
| 43 | +//! - the forward directon from [IrepKey] to unique numbers is implemented using a `HashMap<IrepKey,usize>` |
| 44 | +//! - the inverse direction from unique numbers to [NumberedIrep] is implemented usign a `Vec<NumberedIrep>` |
| 45 | +//! called the `index` that stores [NumberedIrep] under their unique number. |
| 46 | +//! |
| 47 | +//! Earlier we said that an [NumberedIrep] is conceptually a pair formed of |
| 48 | +//! an [IrepKey] and its unique number. It is represented using only |
| 49 | +//! a pair formed of a `usize` representing the unique number, and a `usize` |
| 50 | +//! representing the index at which the key can be found inside a single vector |
| 51 | +//! of type `Vec<usize>` called `keys` where all [IrepKey] are concatenated when |
| 52 | +//! they first get numbered. The inverse map of keys is represented this way |
| 53 | +//! because the Rust hash map that implements the forward cache owns |
| 54 | +//! its keys which would make it difficult to store keys references in inverse |
| 55 | +//! cache, which would introduce circular dependencies and require `Rc` and |
| 56 | +//! liftetimes annotations everywhere. |
| 57 | +//! |
| 58 | +//! Numberig an [Irep] consists in recursively traversing it and numbering its |
| 59 | +//! contents in a bottom-up fashion, then assembling its [IrepKey] and querying |
| 60 | +//! the cache to either return an existing [NumberedIrep] or creating a new one |
| 61 | +//! (in case that key has never been seen before). |
| 62 | +//! |
| 63 | +//! The [GotoBinarySerializer] internally uses a [IrepNumbering] and a cache |
| 64 | +//! of [NumberedIrep] and [NumberedString] it has already written to file. |
| 65 | +//! |
| 66 | +//! When given an [InternedString], [IrepId] or [Irep] to serialize, |
| 67 | +//! the [GotoBinarySerializer] first numbers that object using its internal |
| 68 | +//! [IrepNumbering] instance. Then it looks up that unique number in its cache |
| 69 | +//! of already written objects. If the object was seen before, only the unique |
| 70 | +//! number of that object is written to file. If the object was never seen |
| 71 | +//! before, then the unique number of that object is written to file, followed |
| 72 | +//! by the objects describing its contents (themselves only being written fully |
| 73 | +//! if they have never been seen before, or only referenced if they have been |
| 74 | +//! seen before, etc.) |
| 75 | +//! |
| 76 | +//! The [GotoBinaryDeserializer] also uses an [IrepNumbering] and a cache |
| 77 | +//! of [NumberedIrep] and [NumberedString] it has already read from file. |
| 78 | +//! Dually to the serializer, it will only attempt to decode the contents of an |
| 79 | +//! object from the byte stream on the first occurrence. |
4 | 80 |
|
5 | 81 | use crate::irep::{Irep, IrepId, Symbol, SymbolTable}; |
6 | 82 | use crate::{InternString, InternedString}; |
@@ -40,82 +116,6 @@ pub fn read_goto_binary_file(filename: &Path) -> io::Result<()> { |
40 | 116 | deserializer.read_file() |
41 | 117 | } |
42 | 118 |
|
43 | | -/// # Design overview |
44 | | -/// |
45 | | -/// When saving a [SymbolTable] to a binary file, the [Irep] describing each |
46 | | -/// symbol's type, value and source location are structurally hashed and |
47 | | -/// uniquely numbered so that structurally identical [Irep] only get written |
48 | | -/// in full to the file the first time they are encountered and that ulterior |
49 | | -/// occurrences are referenced by their unique number instead. |
50 | | -/// The key concept at play is that of a numbering, ie a function that assigns |
51 | | -/// numbers to values of a given type. |
52 | | -/// |
53 | | -/// The [IrepNumbering] struct offers high-level methods to number |
54 | | -/// [InternedString], [IrepId] and [Irep] values: |
55 | | -/// - [InternedString] objects get mapped to [NumberedString] objects based on |
56 | | -/// the characters they contain. |
57 | | -/// - [IrepId] objects get mapped to [NumberedString] objects based on the |
58 | | -/// characters of their string representation, in the same number space |
59 | | -/// as [InternedString]. |
60 | | -/// - [Irep] objects get mapped to [NumberedIrep] based on: |
61 | | -/// + the unique numbers assigned to their [Irep::id] attribute, |
62 | | -/// + the unique numbers of [Irep] found in their [Irep::sub] attribute, |
63 | | -/// + the pairs of unique numbers assigned to the ([IrepId],[Irep]) pairs |
64 | | -/// found in their [Ipre::named_sub] attribute. |
65 | | -/// |
66 | | -/// In order to assign the same number to structurally identical [Irep] objects, |
67 | | -/// [IrepNumbering] essentially implements a cache where each [NumberedIrep] is |
68 | | -/// keyed under an [IrepKey] that describes its internal structure. |
69 | | -/// |
70 | | -/// An [IrepKey] is simply the vector of unique numbers describing the |
71 | | -/// `id`, `sub` and `named_sub` attributes of a [Irep]. |
72 | | -/// |
73 | | -/// A [NumberedIrep] is conceptually a pair made of the [IrepKey] itself and the |
74 | | -/// unique number assigned to that key. |
75 | | -/// |
76 | | -/// The cache implemented by [IrepNumbering] is bidirectional. It lets you |
77 | | -/// compute the [NumberedIrep] of an [Irep], and lets you fetch a numbered |
78 | | -/// [NumberedIrep] from its unique number. |
79 | | -/// |
80 | | -/// In practice: |
81 | | -/// - the forward directon from [IrepKey] to unique numbers is implemented using a `HashMap<IrepKey,usize>` |
82 | | -/// - the inverse direction from unique numbers to [NumberedIrep] is implemented usign a `Vec<NumberedIrep>` |
83 | | -/// called the `index` that stores [NumberedIrep] under their unique number. |
84 | | -/// |
85 | | -/// Earlier we said that an [NumberedIrep] is conceptually a pair formed of |
86 | | -/// an [IrepKey] and its unique number. It is represented using only |
87 | | -/// a pair formed of a `usize` representing the unique number, and a `usize` |
88 | | -/// representing the index at which the key can be found inside a single vector |
89 | | -/// of type `Vec<usize>` called `keys` where all [IrepKey] are concatenated when |
90 | | -/// they first get numbered. The inverse map of keys is represented this way |
91 | | -/// because the Rust hash map that implements the forward cache owns |
92 | | -/// its keys which would make it difficult to store keys references in inverse |
93 | | -/// cache, which would introduce circular dependencies and require `Rc` and |
94 | | -/// liftetimes annotations everywhere. |
95 | | -/// |
96 | | -/// Numberig an [Irep] consists in recursively traversing it and numbering its |
97 | | -/// contents in a bottom-up fashion, then assembling its [IrepKey] and querying |
98 | | -/// the cache to either return an existing [NumberedIrep] or creating a new one |
99 | | -/// (in case that key has never been seen before). |
100 | | -/// |
101 | | -/// The [GotoBinarySerializer] internally uses a [IrepNumbering] and a cache |
102 | | -/// of [NumberedIrep] and [NumberedString] it has already written to file. |
103 | | -/// |
104 | | -/// When given an [InternedString], [IrepId] or [Irep] to serialize, |
105 | | -/// the [GotoBinarySerializer] first numbers that object using its internal |
106 | | -/// [IrepNumbering] instance. Then it looks up that unique number in its cache |
107 | | -/// of already written objects. If the object was seen before, only the unique |
108 | | -/// number of that object is written to file. If the object was never seen |
109 | | -/// before, then the unique number of that object is written to file, followed |
110 | | -/// by the objects describing its contents (themselves only being written fully |
111 | | -/// if they have never been seen before, or only referenced if they have been |
112 | | -/// seen before, etc.) |
113 | | -/// |
114 | | -/// The [GotoBinaryDeserializer] also uses an [IrepNumbering] and a cache |
115 | | -/// of [NumberedIrep] and [NumberedString] it has already read from file. |
116 | | -/// Dually to the serializer, it will only attempt to decode the contents of an |
117 | | -/// object from the byte stream on the first occurrence. |
118 | | -
|
119 | 119 | /// A numbered [InternedString]. The number is guaranteed to be in [0,N]. |
120 | 120 | /// Had to introduce this indirection because [InternedString] does not let you |
121 | 121 | /// access its unique id, so we have to build one ourselves. |
|
0 commit comments