Replies: 2 comments 3 replies
-
|
Quick questions:
Given that this data is self-reported and has not been further validated against other questionnaires or medical records, some level of inconsistency is expected. Therefore, I would be cautious about applying overly strict criteria to confirm or rule out whether a specific event occurred. However, it is interesting to observe that the self-reported Lifelines data appears to lack full reliability. |
Beta Was this translation helpful? Give feedback.
-
|
This issue seems mostly related to dropout during the subsequent survey rounds, and not so much related to missing values within a questionnaire, see the screenshot below. I think how we handle this issue is related to the outcomes we would like to predict. In earlier discussions we settled on (an approximation of) the 10-year risk of having a certain disease, right? Looking at the data availability for example for stroke, one can see that between 3A and 3B two thirds of the data is lost due to dropout. However, the interval between 1A and 3A is still approximately 10 years. 1A 2007-2013 Would it therefore be an idea to use 3A as the last observation? This hopefully leads to more cases compared to using 3B as final endpoint. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello @baukearends and @KasiaSmietanka,
This is a follow-up discussion on pairing-rules issue #22 (MyDigiTwinNL/CDF2Medmij-Mapping-tool#22). This is the last 'issue' to be solved concerning data harmonisation (for the PoC), so I'd appreciate your input on this.
Following @baukearends suggestion, I'm defining an alternative pairing rules that allows the inclusion of Conditions (Stroke, Diabetes, etc) self-reported as not present - so that we can distinguish them from the Conditions that are not included on the target FHIR dataset due to missing data (e.g., skipped assessments) on lifelines. However, this raised new questions, in particular on which cases we can say that a participant didn't have a given condition across the study (i.e., the condition can be considered as negative), and in which cases the condition can't be considered as neither active nor inactive, due to missing data.
My interpretation is the following:
IF a YES/TRUE is given on the baseline, or on any follow-up assessment, e.g.:
The output is: Status: Active. Verification: Unknown (this is how it is currently mapped)
ELSE IF there is an explicit NO/FALSE on the last follow up (in this example, 3b), e.g.:
The output is: Status: Inactive. Confirmation: Known to be absent.
ELSE (Any other: e.g.:)
The resource is not created (can't be identified as neither positive nor negative case).
As you see, my rationale for the last two conditions is: when a participant has not self-reported a Condition as 'Active' (baseline or followup assessments), on the FHIR dataset this condition would be identified as 'Inactive/Absent' if and only if the last follow-up assessment was included. Any other interpretation would require making assumptions about the outcome of the missing last assessment (the last assessment with respect to the overall cohort study).
Using this, for example, the following subsets for 'stroke' cases could be identified (using SQL) from within the harmonised data:
Self-reported as positive: 6217
Self-reported as negative:9013
Undetermined: 135883
Does this interpretation make sense to you?
Beta Was this translation helpful? Give feedback.
All reactions