Skip to content

Changing data model#29

Open
anas-elghafari wants to merge 106 commits intocross_mappingfrom
changing_data_model
Open

Changing data model#29
anas-elghafari wants to merge 106 commits intocross_mappingfrom
changing_data_model

Conversation

@anas-elghafari
Copy link
Copy Markdown
Collaborator

bringing in the small fixes from my branch to the cross_mapping

anas-elghafari and others added 30 commits October 9, 2025 13:25
updating the datamodel branch with the latest changes form main
Latest SPARQL query for the new data model
…ery expectations

- Remove primary/secondary outcome specs from generic field mapping
- Add custom logic in handle_special_fields to create nested structure:
  protocol -> outcome_specification -> primary/secondary_outcome_specification
- This matches the structure expected by the studies metadata SPARQL query
- Fixes issue where outcomes were not being fetched from the triplestore
…ER BY)

- Change lines[2:290] to lines[2:291] to include line 290 (GROUP BY clause)
- Change lines[297:542] to lines[297:543] to include line 542 (ORDER BY clause)
- Python slice notation excludes the end index, causing queries to be malformed
- This was causing 'QueryBadFormed' errors when executing the queries
- Add query_endpoint.setMethod('POST') to explicitly use POST requests
- Prevents 'headers size should fit in 8kb' error for large queries
- GET method puts query in URL/headers which has size limitations
- POST method sends query in request body with no practical size limit
…_metadata

- Add logging at each major step to identify where processing is slow/stuck
- Optimize study_name to cohort_id matching using pre-built dictionary
- Avoid O(n*m) loop by creating cohort_id_map for fast lookups
- Log counts of studies, variables, and processing time at each step
- Helps diagnose performance issues with large datasets
Critical fixes to make uploaded data structure match what queries expect:
- Change study type to use iao:is_about relationship (was is_described_by)
- Change RDF type to sio:descriptor (was sio:study_descriptor)
- Add rdfs:label with actual study type value for query matching
- Add use_rdfs_label config option to support value-based labels

This fixes the issue where queries returned 0 results because:
Query expected: ?study iao:is_about ?descriptor . ?descriptor rdfs:label ?study_type
Upload created: ?study is_described_by ?descriptor (wrong predicate)
briniging latest changes and fixes from Komal's branch
anas-elghafari and others added 30 commits January 18, 2026 05:37
merging latest changes from the main branch
…-based query from main branch"

This reverts commit e60b9bb.
…e correct field names from CMEO query (study_name, var_name, var_label, etc.)
…studies_metadata graph, Query 2 retrieves variables from study graphs
…ELECT subquery and update line ranges in query extraction functions
…nly strings and skip categories with empty labels
…load.py data storage formats for varType and units
…le both CohortVarLinker and upload.py data storage formats
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant