Skip to content

Ref api#3191

Open
YishaiGlasner wants to merge 42 commits intomasterfrom
ref-api
Open

Ref api#3191
YishaiGlasner wants to merge 42 commits intomasterfrom
ref-api

Conversation

@YishaiGlasner
Copy link
Copy Markdown
Contributor

Summary

  • Add a new /api/ref/<tref> endpoint that returns structured metadata for any Sefaria Ref
  • Provide consistent navigation and structure metadata across node types (JaggedArrayNode, SchemaNode, Dictionary, Sheets, etc.)
  • Avoids redundant MongoDB queries by ensuring a single vstate fetch per request for standard text refs (i.e. not virtual nodes)
  • Fix prev_segment_ref / next_segment_ref to correctly handle virtual nodes (e.g. Siddur)
  • Add optional state_ja parameter to avoid redundant DB calls when state is already available
  • Add a pymongo QueryCounter listener for asserting query counts in tests
  • Add comprehensive test coverage and OpenAPI documentation

API Details

New endpoint: GET /api/ref/<tref>

Returns a JSON object with:

  • is_ref — whether the input resolves to a valid ref (returns {is_ref: false} for invalid input)
  • normalized, hebrew, url_ref — normalized representations
  • index_title, node_type — index and node metadata
  • depth, address_types, section_names — structure info (for JaggedArrayNode / DictionaryEntryNode)
  • start_indexes, start_labels, end_indexes, end_labels — section position
  • navigation_refs — contextual navigation:
    • lineage_refs_top_down — ancestor refs from root to parent
    • first_available_section_ref — first section with content
    • first_subref / last_subref — child navigation (non-segment, non-range)
    • prev_section_ref / next_section_ref — section-level navigation
    • prev_segment_ref / next_segment_ref — segment-level navigation
  • children — child node titles (for SchemaNode)
  • default_child_node — default child metadata when applicable
  • sheet_id, lexicon_name, headword — type-specific fields

Considerations

  • Navigation scope
    prev_* and next_* are only defined for section-level and segment-level refs.
    Navigation at higher levels is intentionally not exposed to avoid ambiguity. Consumers can traverse upward (via lineage_refs_top_down) and derive such relationships if needed.
  • Field presence
    Fields that are not applicable to a given ref type are omitted.
    Fields that are applicable but have no value (e.g. no previous or next ref exists) are returned as null.

Changes in Ref

  • Fix prev_segment_ref and next_segment_ref to support DictionaryEntryNode
  • Add optional state_ja parameter to selected methods (already supported in others) to improve performance

pymongo listener

Adds QueryCounter, a pymongo CommandListener used in tests to:

  • Count MongoDB queries per request
  • Record query tracebacks for debugging
    Tests reset the counter before each API call and assert on QueryCounter.count. On failure, full query tracebacks are printed to help identify unnecessary database hits.
    The listener is only registered in test environments (sys._called_from_test), so there is zero production overhead.

Note on tests

api/tests.py is currently not part of the CI suite (historical decision).
All new tests were added there and can be run locally.

…o not return the sections (section are not defined well as parts when the ref is range).
…a` param, but call other functions that use `vstate`.
@YishaiGlasner YishaiGlasner requested a review from akiva10b March 25, 2026 12:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new GET /api/ref/<tref> API endpoint to validate and introspect Sefaria refs, returning structured node/structure metadata and navigation refs, with accompanying OpenAPI documentation and tests. The PR also updates core Ref navigation helpers to better support virtual nodes and to reduce redundant DB work by allowing callers to pass a pre-fetched VersionState.

Changes:

  • Add RefView (/api/ref/<tref>) returning normalized/hebrew/url forms, node metadata, structure fields, and navigation refs.
  • Introduce a pymongo QueryCounter listener (test-only) to assert Mongo command counts in API tests.
  • Extend/refine Ref navigation/state helpers (prev_segment_ref, next_segment_ref, first_available_section_ref, get_state_ja, is_empty) to accept an optional vstate.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sefaria/urls_shared.py Routes the new /api/ref/<tref> endpoint to RefView.
api/views.py Implements RefView response construction and navigation metadata.
sefaria/model/text.py Updates Ref navigation + state access to support vstate and virtual-node behavior.
sefaria/system/database.py Adds QueryCounter and registers it as a pymongo listener in test environments.
api/tests.py Adds comprehensive tests for the new endpoint + query-count assertions.
docs/openAPI.json Documents /api/ref/{tref} and the RefJSON response schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# return db.texts.find(self.condition_query(), {"_id": 1}).count() == 0
if vstate and not self.index_node.is_virtual:
state_ja = self.get_state_ja(vstate=vstate)
return state_ja.sub_array_length([i - 1 for i in self.sections]) in (0, None)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_empty()'s new vstate fast-path is incorrect for many refs. VersionState.state_node(...).ja() returns a JaggedIntArray whose leaf values are ints; calling sub_array_length() after indexing down to a leaf hits the TypeError path and returns 0, which makes segment-level refs (and other fully-specified refs) appear empty even when text exists. Use a content check that works at arbitrary depth (e.g., state_ja.subarray_with_ref(self).is_empty() or get_element() for segment-level) instead of sub_array_length(self.sections).

Suggested change
return state_ja.sub_array_length([i - 1 for i in self.sections]) in (0, None)
subarray = state_ja.subarray_with_ref(self)
return subarray.is_empty()

Copilot uses AI. Check for mistakes.
if not r:
return None
if self.index_node.is_virtual:
return r.all_subrefs()[0]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prev_segment_ref() for virtual nodes, when the current ref is the first segment of a section, the previous segment should be the last segment of the previous section. Returning r.all_subrefs()[0] returns the first segment instead (and can also raise IndexError if the previous section has no subrefs). Adjust this to return the last available subref (and handle empty subref lists).

Suggested change
return r.all_subrefs()[0]
subrefs = r.all_subrefs()
if not subrefs:
# No subrefs available in the previous section; fall back to the section ref itself
return r
return subrefs[-1]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing to -1
the assumption that previous section has segments is also when not vurtual.

"type": "object",
"properties": {
"lineage_refs_top_down": {
"description": "An array of references representing the hierarchical path from the book level down to the immediate parent. Empty array if at book level.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about the name 'lineage_refs_top_down'. What about 'ancestor_refs'? When do you foresee it being used by API users?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's essentially the shortest path to the root so that fact could also factor in the name: 'shortest_path_to_root', but 'ancestor_refs' might be sufficient

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i neither crazy, but i didn't want to use 'ancestor', for we nodes where we use the word ancestor, like Ramban on genesis is the father of Ramban on Genesis, Introuduction; so i'm not sure we want to call Genesis 1 the father of Genesis 1:5. what do you think?

r = self.ending_ref()
if not r.is_segment_level():
return r
if self.index_node.is_virtual:
Copy link
Copy Markdown
Contributor

@stevekaplan123 stevekaplan123 Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of special casing for VirtualNodes, I'm wondering if you could take advantage of the fact that VirtualNodes and JaggedArrayNodes inherit from TitledTreeNodes and create a function in TitledTreeNodes that handles the base case. Then you can override that function in VirtualNode. That way, here you can just call this function on self.index_node rather than having a special case both here and in prev_segment_ref.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think the node is the right place for function that finds the next segment. the place to find it is the vstate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be true, and we special case VirtualNodes a lot it seems. And in this function half of it is dealing with the special case of VirtualNodes. What do you mean the place to find it is the vstate? This is the Ref class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean that we're finding the next segment by checking the vstate, but the node is just the jagged array. it can tell you its depth. maybe you can expect it to tell you the next jagged array node. but it doesn't know what populates it (in some cases we do have a limitation on the top level, like 50 chapters in Genesis, but no regarding the verses in any chapter). so i don't think the node should have a method for the next segment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants