From 7336d10694c10ed1939e0fc73b865eebc0501598 Mon Sep 17 00:00:00 2001 From: Gerwin Klein Date: Fri, 25 Jul 2025 18:20:22 +1000 Subject: [PATCH 1/5] init Signed-off-by: Gerwin Klein --- src/proposed/0200-domain-schedules.md | 212 ++++++++++++++++++++++++++ 1 file changed, 212 insertions(+) create mode 100644 src/proposed/0200-domain-schedules.md diff --git a/src/proposed/0200-domain-schedules.md b/src/proposed/0200-domain-schedules.md new file mode 100644 index 0000000..bac3191 --- /dev/null +++ b/src/proposed/0200-domain-schedules.md @@ -0,0 +1,212 @@ + + +# Runtime Domain Schedules + + + +- Author: Gerwin Klein, Rafal Kolanski, Indan Zupancic +- Proposed: [YYYY-MM-DD fill in] + +## Summary + +We propose to add the ability to set, maintain, and switch between domain +schedules at runtime, authorised by the existing `DomainCap`, retaining the +current information flow proof and behaviour when the `DomainCap` no longer +exists in the system or is not used. + +## Motivation + +Currently the domain schedule is provided as a `.h` file and compiled into the +kernel. This was an initial stop-gap implementation and is inconvenient for SDK +approaches such as the Microkit and overall inflexible and hard to use. + +For instance, it is currently not possible to provide a verified kernel binary +and modify it with a new kernel schedule without invalidating the verification. + +It is also currently not possible to run a system in multiple modes, such as +"wheels up" and "wheels down" in aviation. + +The intended use for the proposed mechanism is that the initialiser sets up +domain schedules at boot time, for instance the two schedules in the example +above, or a single domain schedule as would now be provided as `.h` file. After +that, if the system is to remain static, the `DomainCap` can be deleted as it is +now, or it can be given to a high privilege mode control component that can +atomically switch between schedules. + +Updating a running system to tweak an existing schedule or add a new mode at +runtime would also become possible. + +## Guide-level explanation + +At a conceptual level, we propose to mostly keep the current domain scheduler +implementation as it is: a compile-time sized array of entries consisting of +duration and domain. Instead of representing a single schedule as before, we +propose to use the array now for representing multiple schedules. The new +components are: + +- entries with duration 0 are end markers for a domain schedule +- there is a new kernel global that contains the start index for the current + domain schedule + +### Example + +The following diagram shows an example. + +```none +----------------------------------------------------------------- +| (t_0, d_0) | (t_1, d_1) | ... | (0, 0) | (t_x, d_x) | .. | .. | +----------------------------------------------------------------- + 0 ^ n n+1 + +ksDomainStart = 0 +ksDomScheduleIdx = 1 +``` + +The example has an overall array of length `config_DomScheduleLength`, +configured at compile time, a current domain start index of 0, and an currently +active domain schedule index of 1. The current entry will run domain `d_1` for a +duration of `t_1 > 0`, and the schedule will keep going until it hits index `n` +which has an entry with duration 0. Instead of running the entry at index `n`, +it proceeds at `ksDomainStart`, i.e. index 0. + +To populate the entires, a user thread with the `DomainCap` can invoke the +`setDomainEntry` method to set duration and domain for a specific index, +potentially with duration 0 to create an end marker. + +To atomically switch to the second schedule in the example, a user thread with +the `DomainCap` can set `ksDomainStart` to `n+1`, which will start running +domain `d_x` for a duration `t_x`. The schedule will keep going until either +another entry with duration `0` occurs, or the index hits the end of the array. +In both cases, the index of the next entry will again be `ksDomainStart`. + +### Conditions and Invariants + +For both, setting the domain start and setting the value of an entry, the kernel +will prevent the user from creating a schedule where the entry at +`ksDomainStart` has duration 0. This would signify an empty domain schedule. It +would be possible to give this situation a useful meaning: not switching domains +until a new schedule is set. However, implementing it would be more invasive +than the currently proposed changes and the same effect can already be achieved +with a one-element schedule of long duration. + +Since all duration 0 signifies an end marker, all active schedule entries +automatically have a duration > 0. On MCS, the duration at the API level is in +microseconds and must be >= MIN_PERIOD. The stored duration is in timer ticks. +On non-MCS configurations, the duration is measured in number of ticks (time +slices). + +The kernel initialises with an array where all entries are `(0, 0)`, apart from +the entry at index 0, which will run domain 0 for the maximum expressible time. + +## Reference-level explanation + +### Invocations + +The current invocations of the `DomainCap` remain as they are. +There are two new invocations: + +#### setDomainStart + +TODO + + + +#### setDomainEntry + +TODO + + + +### Configuration Options + +The new config option `config_DomScheduleLength` determines the static size of +the overall domain schedule array and thereby the longest domain schedule that +is possible to configure at runtime + + + +## Drawbacks + +TODO: breaking change, need to convert `.h` files into initialiser code. Could +adapt capDL with a schedule section, or provide a separate schedule +specification for initialiser components. + + + +## Rationale and alternatives + +TODO + +- Why is this design the best in the space of possible designs? +- What other designs have been considered and what is the rationale for not + choosing them? +- What is the impact of not doing this? + +## Prior art + +TODO + +Discuss prior art, both the good and the bad, in relation to this proposal. A +few examples of what this can include are: + +- For ecosystem proposals: Does this feature exist in similar systems and what + experience have their community had? +- For community proposals: Is this done by some other community and what were + their experiences with it? +- What lessons can we learn from what other communities have done here? +- Are there any published papers or great posts that discuss this? If you have + some relevant papers to refer to, this can serve as a more detailed + theoretical background. + +This section is intended to encourage you as an author to think about the +lessons from other systems, provide readers of your RFC with a fuller picture. +If there is no prior art, that is fine -- your ideas are interesting to us +whether they are brand new or if it is an adaptation from other systems. + +Note that while precedent set by other systems is some motivation, it does not +on its own motivate an RFC. + +## Unresolved questions + +TODO + +- What needs to be resolved in further discussion before the RFC is approved? +- What needs to resolved during the implementation of this RFC? +- What related questions are beyond the scope of this RFC that should be + addressed beyond its implementation? From 54a4bfed254e51521f7408e7f4df034dbdf8ab9e Mon Sep 17 00:00:00 2001 From: Gerwin Klein Date: Wed, 6 Aug 2025 20:17:25 +0200 Subject: [PATCH 2/5] domain-schedules: more detail - swap entries to be (d, t) instead (t, d) - only allow (0,0) end markers - add API descriptions - add initialiser description Signed-off-by: Gerwin Klein --- src/proposed/0200-domain-schedules.md | 133 +++++++++++++++++++++----- 1 file changed, 109 insertions(+), 24 deletions(-) diff --git a/src/proposed/0200-domain-schedules.md b/src/proposed/0200-domain-schedules.md index bac3191..db41380 100644 --- a/src/proposed/0200-domain-schedules.md +++ b/src/proposed/0200-domain-schedules.md @@ -52,11 +52,11 @@ runtime would also become possible. At a conceptual level, we propose to mostly keep the current domain scheduler implementation as it is: a compile-time sized array of entries consisting of -duration and domain. Instead of representing a single schedule as before, we +domain and duration. Instead of representing a single schedule as before, we propose to use the array now for representing multiple schedules. The new -components are: +components are: -- entries with duration 0 are end markers for a domain schedule +- entries with the value `(0, 0)` are end markers for a domain schedule - there is a new kernel global that contains the start index for the current domain schedule @@ -66,7 +66,7 @@ The following diagram shows an example. ```none ----------------------------------------------------------------- -| (t_0, d_0) | (t_1, d_1) | ... | (0, 0) | (t_x, d_x) | .. | .. | +| (d_0, t_0) | (d_1, t_1) | ... | (0, 0) | (d_x, t_x) | .. | .. | ----------------------------------------------------------------- 0 ^ n n+1 @@ -78,7 +78,7 @@ The example has an overall array of length `config_DomScheduleLength`, configured at compile time, a current domain start index of 0, and an currently active domain schedule index of 1. The current entry will run domain `d_1` for a duration of `t_1 > 0`, and the schedule will keep going until it hits index `n` -which has an entry with duration 0. Instead of running the entry at index `n`, +which has an entry with value `(0, 0)`. Instead of running the entry at index `n`, it proceeds at `ksDomainStart`, i.e. index 0. To populate the entires, a user thread with the `DomainCap` can invoke the @@ -101,15 +101,23 @@ until a new schedule is set. However, implementing it would be more invasive than the currently proposed changes and the same effect can already be achieved with a one-element schedule of long duration. -Since all duration 0 signifies an end marker, all active schedule entries -automatically have a duration > 0. On MCS, the duration at the API level is in -microseconds and must be >= MIN_PERIOD. The stored duration is in timer ticks. -On non-MCS configurations, the duration is measured in number of ticks (time -slices). +Apart from the value `(0, 0)`, the kernel will prevent the creation of entries +with duration 0. On MCS, the duration at the API level is in microseconds and +must be >= MIN_PERIOD. The stored duration is in timer ticks. On non-MCS +configurations, the duration is measured in number of ticks (time slices). The kernel initialises with an array where all entries are `(0, 0)`, apart from the entry at index 0, which will run domain 0 for the maximum expressible time. +The kernel will preserve the following invariants: + +- `ksDomainStart < config_DomScheduleLength` +- `ksDomScheduleIdx < config_DomScheduleLength` +- `schedule[ksDomainStart].duration ~= 0` +- for all `i` with `0 <= i config_DomScheduleLength`, + if `schedule[i].duration = 0` then `schedule[i].domain = 0` + + ## Reference-level explanation ### Invocations @@ -117,29 +125,106 @@ the entry at index 0, which will run domain 0 for the maximum expressible time. The current invocations of the `DomainCap` remain as they are. There are two new invocations: -#### setDomainStart +#### Set_Domain_Start -TODO +`static inline int seL4_Set_Domain_Start` - +Set the start index of the current domain schedule. The schedule entry at this +index must not be the end marker `(0, 0)`. The domain at the schedule entry with +the specified index will start running immediately after the kernel call +completes. -#### setDomainEntry +| Type | Name | Description | +| ----------- | ---------- | ----------------------------------------------- | +| `seL4_DomainSet` | `_service` | Domain capability to authorise the call | +| `seL4_Word` | `index` | The new start index. Must not point to `(0,0)` and must be smaller than `config_DomScheduleLength` | -TODO +The function returns `seL4_NoError` for success as usual. The following error +codes are returned otherwise: - +| Error | Possible cause | +| ---------------------- | ----------------------------------------------- | +| `seL4_InvalidCapability` | the provided capability is not the domain capability | +| `seL4_InvalidArgument` | the entry at the provided index is an end marker | +| `seL4_RangeError` | the index is not less than `config_DomScheduleLength` | + + +#### Set_Domain_Entry + +`static inline int seL4_Set_Domain_Entry` + +Set and entry in the domain schedule at the specified index to the specified +domain and duration. If the duration is 0, the domain must also be 0 to indicate +and end marker. On MCS, the duration must be `>= MIN_BUDGET`. If the index is +the current schedule start index, the entry must not be an end marker `(0, 0)`. + +The change to the schedule takes effect when the entry is next read by the +kernel. In particular, when the duration or domain of the currently running +schedule index are changed, the change will only take effect after the current +domain time slice has expired and the schedule reaches the current index again. + +Section [Initialisation](#initialisation) contains a scenario where setting the +entry at the currently running index may be useful, but generally one should +avoid updating the currently running schedule. Instead create new schedules +beyond current end marker and then set the schedule start once the system is +ready to switch. + +| Type | Name | Description | +| ----------- | ---------- | ----------------------------------------------- | +| `seL4_DomainSet` | `_service` | Domain capability to authorise the call | +| `seL4_Word` | `index` | The index of the entry to set. Must be smaller than `config_DomScheduleLength`. | +| `seL4_Uint8` | `domain` | The domain of the schedule entry. Must be smaller than `CONFIG_NUM_DOMAINS`. Must be 0 if the duration is 0. | +| `seL4_Word` | `duration` | The duration for the entry. On MCS, must be 0 or `>= MIN_BUDGET`. | + +The function returns `seL4_NoError` for success as usual. The following error +codes are returned otherwise: + +| Error | Possible cause | +| ---------------------- | ----------------------------------------------- | +| `seL4_InvalidCapability` | the provided capability is not the domain capability | +| `seL4_InvalidArgument` | the index is the current domain start index and the duration is 0, or the duration is 0, but the domain is not 0. | +| `seL4_RangeError` | the index is not less than `config_DomScheduleLength`, or the domain is not less than `CONFIG_NUM_DOMAINS`, or on MCS the duration is less than `MIN_BUDGET` | ### Configuration Options The new config option `config_DomScheduleLength` determines the static size of the overall domain schedule array and thereby the longest domain schedule that -is possible to configure at runtime +is possible to configure at runtime. The default value for +`config_DomScheduleLength` is 256. + +### Initialisation + +At system startup, the array contains 2 active entries: entry 0 with domain 0 +and a long duration, and entry 1 with `(0, 0)`. With the following scheme it is +possible to use the full length of the array even though these two entries are +already in use. + +At kernel boot time, given a user-provided schedule [(d_0, t_0), (d_1, t_1), +...] that satisfies the requirements [described +above](#conditions-and-invariants), the root task could achieve the provided +schedule as follows: + +1. First, set up the of rest system as before, including starting all threads +2. Set all `(d_i, t_i)` according to the schedule where `i > 1`. +3. Then, overwrite the two active schedule entries 0 and 1 with + `(d_0, t_0)` and `(d_1, t_1)`. +4. Set the schedule start to the desired start value, e.g. index 0 +5. Suspend/stop the initialiser + +With this the first run of domain 0 after the initialiser will not get its full +time slice, because step 5 will already run in the user-provided schedule, but +after that, the user-provided schedule will be in force. + +This works if the initialiser can finish its work within the duration of entry +0. Since the duration is set to the maximum expressible time, this should in +practice never be an issue. Even if the time is not sufficient, the procedure +will still work unless the domain time of the initialiser happens to expire +during the execution of step 3. If that is a possibility, the initialiser could +include a suitable delay before step 3 to make this impossible. + +The reason the scheme works is that the kernel will not act on new values in the +schedule before the current domain slice has expired whereas setting the start +index in step 4 comes into effect immediately. # Runtime Domain Schedules - - - Author: Gerwin Klein, Rafal Kolanski, Indan Zupancic -- Proposed: [YYYY-MM-DD fill in] +- Proposed: 2025-09-10 ## Summary @@ -225,33 +214,33 @@ The reason the scheme works is that the kernel will not act on new values in the schedule before the current domain slice has expired whereas setting the start index in step 4 comes into effect immediately. - +### capDL initialiser + +The capDL initialiser will change to also initialise the domain schedule if one +is provided. + +The schedule is provided in a separate new section of the capDL input +specification, specifying the schedule as a comma-separated list of pairs +(domain, duration). If no end marker is provided at the end of the list, +an implicit end marker is assumed. ## Drawbacks -TODO: breaking change, need to convert `.h` files into initialiser code. Could -adapt capDL with a schedule section, or provide a separate schedule -specification for initialiser components. +The main drawback is that this is a breaking change for users. Setting and +initialising a domain schedule now works differently, and build scripts may +need to be updated to no longer generate/add the former `.h` file that contained +the schedule. - +In theory a tool could be provided that converts current `.h` file schedules +into the capDL sections, but since `.h` files can contain anything we are not +proposing such a tool as part of this RFC. + +Tests for the domain scheduler in sel4test will need to be changed to create the +test schedules at initialisation time instead of expecting them to be compiled +in. + +Tools and applications that do not use the domain scheduler should be +unaffected. ## Rationale and alternatives @@ -325,33 +314,28 @@ switch to the new schedule atomically. ## Prior art -TODO +Most separation kernels provide some form of configurable static scheduling. -Discuss prior art, both the good and the bad, in relation to this proposal. A -few examples of what this can include are: +The motivation for the original domain scheduler design in seL4 was an explicit +customer/user request for a static separation kernel configuration of seL4 with +static scheduling in a security and information flow context. This also +motivated the current form of the information flow theorem. For the application +at the time, the current API was sufficient, but for a more general use case it +is too restrictive and has too much build system impact, because it needs +recompilation for schedule changes. -- For ecosystem proposals: Does this feature exist in similar systems and what - experience have their community had? -- For community proposals: Is this done by some other community and what were - their experiences with it? -- What lessons can we learn from what other communities have done here? -- Are there any published papers or great posts that discuss this? If you have - some relevant papers to refer to, this can serve as a more detailed - theoretical background. +Another application area with similar requirements is aviation, in particular +the ARINC 653 standard, which would be better supported by this RFC than the +current implementation. -This section is intended to encourage you as an author to think about the -lessons from other systems, provide readers of your RFC with a fuller picture. -If there is no prior art, that is fine -- your ideas are interesting to us -whether they are brand new or if it is an adaptation from other systems. +This RFC is not attempting to provide a full implementation of any particular +standard, it merely aims to make the existing API more usable and improve the +overall developer experience on seL4. -Note that while precedent set by other systems is some motivation, it does not -on its own motivate an RFC. ## Unresolved questions -TODO - -- What needs to be resolved in further discussion before the RFC is approved? -- What needs to resolved during the implementation of this RFC? -- What related questions are beyond the scope of this RFC that should be - addressed beyond its implementation? +The API is expressed in terms of timer ticks in expectation of an additional +upcoming RFC to change the rest of the API to timer ticks as well. Depending on +the outcome of that discussion, the API could also be in terms of `time_t` +instead if required for consistency.