Conversation
Requirements for a Kernel SSSafety Monitor As discussed during the OSEP call of Week 40, let's try to have this as a collaborative effort, to provide the ARCH WG with OSEP-reviewed requirements. Signed-off-by: Igor Stoppa <[email protected]>
| - An external HW watchdog. | ||
| - An external core, running safety-qualified code (e.g. a safety island) | ||
| - An external execution context running on the same system (e.g. an hypervisor) | ||
| - The failure analysis must consider that the meta-data used by the self-monitor is exposed to interference as well: |
There was a problem hiding this comment.
I understand that you mean the Kernel meta-data here. Perhaps explicitly stating that the meta-data that is to be monitored is the kernel internal data structures, as opposed to userspace data?
I kind of miss the information of what the self monitor is supposed to monitor. Can we place a requirement on that?
There was a problem hiding this comment.
The kernel doesn't even have access - by design - to userspace data, of any kind, shape and form.
It would be a security violation.
|
|
||
| ## **Structure of the document** | ||
| - The document presents a brief overview of what safety problems can be found while using Linux, and how they can be mitigated through self-monitoring. | ||
| - The next sections, instead, discuss more in detail typical pitfalls that must be avoided, to make the monitoring useful and what goals should be targeted. |
| This is where the concept of kernel self monitoring comes into the picture. | ||
|
|
||
|
|
||
| ## **The concept - Safety-oriented self monitoring** |
There was a problem hiding this comment.
I do not see the title fit for the content. The content could be a continuation of the previous section, as you are still introducing the subject. With this title I would expect you to explain the concept, which is not the case.
| - Watchdogs, to detect abnormal delays either in processing events or performing actions. | ||
| - Output vetting, to detect abnormal actuator control signals produced by the system. | ||
|
|
||
| These external safety measures, though, are not always sufficient, because they might have limited ability to observe the internal status of the kernel, or it might be desirable to exert a tighter control over the evolution of its internal states. |
There was a problem hiding this comment.
In the statement above I would add "especially in scenarios where the Kernel is used to enable and manage complex safety workloads"
| This is where the concept of kernel self monitoring comes into the picture. | ||
|
|
||
|
|
||
| ## **The concept - Safety-oriented self monitoring** |
There was a problem hiding this comment.
While I agree on the technical content, I think that in this doc we are missing the theoretical principles, that BTW are quite simple:
- The Self Monitoring must be developed or qualified with a systematic capability level that is equal or higher than the ASIL or SIL level associated with the safety claim that it supports
- The scope of the monitored part of the Kernel and the scope of the self monitoring part shall be clearly identified, including part of the design or code that can be common between the two
- A safety analysis on the monitored code shall define the dangerous failure modes originating from it that would violate the target safety claim
- Assuming the lack of interference from the monitored code and from any other code running in the Kernel, the safety monitor shall be verified to be effective against a subset of the dangerous failure modes mentioned in 3). Any residual dangerous failure mode shall be covered by additional mitigations (that are out of the scope of this document)
- A comprehensive analyses on the possible interference failure modes originating from the monitored code or any other code running in the Kernel shall be carried over. With respect to such analysis the dangerous interference failure modes (leading to a violation of the target safety claim) shall be identified and mitigated through adequate measures.
For example adequate measures could range from Assumption on Use or proper configurations that would lead to the avoidance of such failure modes, designing and implementing additional detection mechanism or the qualification of the code generating interference up to a level of systematic capability that is equal or higher than ASIL or SIL allocated to the target safety claim.
Note: When designing the safety monitors and any additional mitigation measure, also availability, security and performance requirements shall be considered.
E.g. a measure making the overall safety function unavailable is perfectly safe but it would be useless
Requirements for a Kernel SSSafety Monitor
As discussed during the OSEP call of Week 40, let's try to have this as a collaborative effort, to provide the ARCH WG with OSEP-reviewed requirements.