Conversation
…o not return the sections (section are not defined well as parts when the ref is range).
…` function on the node.
… normalized sections.
…s_top_down` and `parent_ref`.
…a` param, but call other functions that use `vstate`.
…n having private functions doing it.
…itor number of times mongo is queried.
There was a problem hiding this comment.
Pull request overview
Adds a new GET /api/ref/<tref> API endpoint to validate and introspect Sefaria refs, returning structured node/structure metadata and navigation refs, with accompanying OpenAPI documentation and tests. The PR also updates core Ref navigation helpers to better support virtual nodes and to reduce redundant DB work by allowing callers to pass a pre-fetched VersionState.
Changes:
- Add
RefView(/api/ref/<tref>) returning normalized/hebrew/url forms, node metadata, structure fields, and navigation refs. - Introduce a pymongo
QueryCounterlistener (test-only) to assert Mongo command counts in API tests. - Extend/refine
Refnavigation/state helpers (prev_segment_ref,next_segment_ref,first_available_section_ref,get_state_ja,is_empty) to accept an optionalvstate.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
sefaria/urls_shared.py |
Routes the new /api/ref/<tref> endpoint to RefView. |
api/views.py |
Implements RefView response construction and navigation metadata. |
sefaria/model/text.py |
Updates Ref navigation + state access to support vstate and virtual-node behavior. |
sefaria/system/database.py |
Adds QueryCounter and registers it as a pymongo listener in test environments. |
api/tests.py |
Adds comprehensive tests for the new endpoint + query-count assertions. |
docs/openAPI.json |
Documents /api/ref/{tref} and the RefJSON response schema. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sefaria/model/text.py
Outdated
| # return db.texts.find(self.condition_query(), {"_id": 1}).count() == 0 | ||
| if vstate and not self.index_node.is_virtual: | ||
| state_ja = self.get_state_ja(vstate=vstate) | ||
| return state_ja.sub_array_length([i - 1 for i in self.sections]) in (0, None) |
There was a problem hiding this comment.
is_empty()'s new vstate fast-path is incorrect for many refs. VersionState.state_node(...).ja() returns a JaggedIntArray whose leaf values are ints; calling sub_array_length() after indexing down to a leaf hits the TypeError path and returns 0, which makes segment-level refs (and other fully-specified refs) appear empty even when text exists. Use a content check that works at arbitrary depth (e.g., state_ja.subarray_with_ref(self).is_empty() or get_element() for segment-level) instead of sub_array_length(self.sections).
| return state_ja.sub_array_length([i - 1 for i in self.sections]) in (0, None) | |
| subarray = state_ja.subarray_with_ref(self) | |
| return subarray.is_empty() |
sefaria/model/text.py
Outdated
| if not r: | ||
| return None | ||
| if self.index_node.is_virtual: | ||
| return r.all_subrefs()[0] |
There was a problem hiding this comment.
In prev_segment_ref() for virtual nodes, when the current ref is the first segment of a section, the previous segment should be the last segment of the previous section. Returning r.all_subrefs()[0] returns the first segment instead (and can also raise IndexError if the previous section has no subrefs). Adjust this to return the last available subref (and handle empty subref lists).
| return r.all_subrefs()[0] | |
| subrefs = r.all_subrefs() | |
| if not subrefs: | |
| # No subrefs available in the previous section; fall back to the section ref itself | |
| return r | |
| return subrefs[-1] |
There was a problem hiding this comment.
changing to -1
the assumption that previous section has segments is also when not vurtual.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…egment of previous section.
…tion/handshake commands. Rather give the user to pass what commands he wants to count.
| "type": "object", | ||
| "properties": { | ||
| "lineage_refs_top_down": { | ||
| "description": "An array of references representing the hierarchical path from the book level down to the immediate parent. Empty array if at book level.", |
There was a problem hiding this comment.
I'm not crazy about the name 'lineage_refs_top_down'. What about 'ancestor_refs'? When do you foresee it being used by API users?
There was a problem hiding this comment.
It's essentially the shortest path to the root so that fact could also factor in the name: 'shortest_path_to_root', but 'ancestor_refs' might be sufficient
There was a problem hiding this comment.
i neither crazy, but i didn't want to use 'ancestor', for we nodes where we use the word ancestor, like Ramban on genesis is the father of Ramban on Genesis, Introuduction; so i'm not sure we want to call Genesis 1 the father of Genesis 1:5. what do you think?
| r = self.ending_ref() | ||
| if not r.is_segment_level(): | ||
| return r | ||
| if self.index_node.is_virtual: |
There was a problem hiding this comment.
Instead of special casing for VirtualNodes, I'm wondering if you could take advantage of the fact that VirtualNodes and JaggedArrayNodes inherit from TitledTreeNodes and create a function in TitledTreeNodes that handles the base case. Then you can override that function in VirtualNode. That way, here you can just call this function on self.index_node rather than having a special case both here and in prev_segment_ref.
There was a problem hiding this comment.
i don't think the node is the right place for function that finds the next segment. the place to find it is the vstate.
There was a problem hiding this comment.
That may be true, and we special case VirtualNodes a lot it seems. And in this function half of it is dealing with the special case of VirtualNodes. What do you mean the place to find it is the vstate? This is the Ref class.
There was a problem hiding this comment.
i mean that we're finding the next segment by checking the vstate, but the node is just the jagged array. it can tell you its depth. maybe you can expect it to tell you the next jagged array node. but it doesn't know what populates it (in some cases we do have a limitation on the top level, like 50 chapters in Genesis, but no regarding the verses in any chapter). so i don't think the node should have a method for the next segment.
Summary
/api/ref/<tref>endpoint that returns structured metadata for any SefariaRefprev_segment_ref/next_segment_refto correctly handle virtual nodes (e.g. Siddur)state_japarameter to avoid redundant DB calls when state is already availableQueryCounterlistener for asserting query counts in testsAPI Details
New endpoint:
GET /api/ref/<tref>Returns a JSON object with:
is_ref— whether the input resolves to a valid ref (returns{is_ref: false}for invalid input)normalized,hebrew,url_ref— normalized representationsindex_title,node_type— index and node metadatadepth,address_types,section_names— structure info (for JaggedArrayNode / DictionaryEntryNode)start_indexes,start_labels,end_indexes,end_labels— section positionnavigation_refs— contextual navigation:lineage_refs_top_down— ancestor refs from root to parentfirst_available_section_ref— first section with contentfirst_subref/last_subref— child navigation (non-segment, non-range)prev_section_ref/next_section_ref— section-level navigationprev_segment_ref/next_segment_ref— segment-level navigationchildren— child node titles (for SchemaNode)default_child_node— default child metadata when applicablesheet_id,lexicon_name,headword— type-specific fieldsConsiderations
prev_*andnext_*are only defined for section-level and segment-level refs.Navigation at higher levels is intentionally not exposed to avoid ambiguity. Consumers can traverse upward (via
lineage_refs_top_down) and derive such relationships if needed.Fields that are not applicable to a given ref type are omitted.
Fields that are applicable but have no value (e.g. no previous or next ref exists) are returned as
null.Changes in Ref
prev_segment_refandnext_segment_refto support DictionaryEntryNodestate_japarameter to selected methods (already supported in others) to improve performancepymongo listener
Adds
QueryCounter, a pymongoCommandListenerused in tests to:Tests reset the counter before each API call and assert on
QueryCounter.count. On failure, full query tracebacks are printed to help identify unnecessary database hits.The listener is only registered in test environments (
sys._called_from_test), so there is zero production overhead.Note on tests
api/tests.pyis currently not part of the CI suite (historical decision).All new tests were added there and can be run locally.