Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
e17dce5
feat: Add SheetScorer tool for analyzing Jewish study sheets with LLM…
morganizzzm Jul 31, 2025
b70c55f
style: clean up formatting and update imports per Yishai’s partial co…
morganizzzm Aug 5, 2025
9150cce
Merge pull request #44 from morganizzzm/add-sheet-scoring
nsantacruz Aug 6, 2025
fb6e509
chore: update requirements to include langchain_openai package
nsantacruz Aug 10, 2025
2a18332
chore: update tiktoken version specification in requirements
nsantacruz Aug 10, 2025
42f505d
style: changed input/output class names of sheet_scoring app to be co…
morganizzzm Aug 10, 2025
e52e938
chore: update sefaria_llm_interface version to 1.3.3 in requirements
nsantacruz Aug 10, 2025
21c91ad
chore: reduce celery worker concurrency from 50 to 4
nsantacruz Aug 10, 2025
2f5c6e3
fix: correct wording in system message for sentence extraction
nsantacruz Aug 10, 2025
b09725b
style: changed imports inside sefaria-llm-main from local to global
morganizzzm Aug 11, 2025
59245bb
Merge remote-tracking branch 'upstream/add-sheet-scoring' into add-sh…
morganizzzm Aug 11, 2025
380201d
Delete app/commentary_scoring directory
morganizzzm Aug 11, 2025
e95874d
style: fixed spelling mistake in PROCESSED_DATETIME_FIELD field and r…
morganizzzm Aug 11, 2025
8515e3f
Merge remote-tracking branch 'origin/add-sheet-scoring' into add-shee…
morganizzzm Aug 11, 2025
c67ead6
feat: updated requirements.txt to use sefraia-llm-interface of the ve…
morganizzzm Aug 12, 2025
642db8a
feat(llm/sheet_scoring): refactor scoring pipeline to use typed I/O a…
morganizzzm Aug 13, 2025
ac0e780
feat: released new package v1.3.6 and updated the requirements.txt
morganizzzm Aug 13, 2025
ee5e14d
feat:
morganizzzm Aug 19, 2025
4eee4d3
fix(style): updated comment for clarity in SheetScoringInput class
yodem Dec 10, 2025
a29c604
Merge branch 'main' into add-sheet-scoring
YishaiGlasner Dec 29, 2025
921ee53
chore: spacing
YishaiGlasner Dec 30, 2025
72c0de6
upgrade langchain
YishaiGlasner Jan 6, 2026
bcf6cee
upgrade langchain
YishaiGlasner Jan 6, 2026
3246b06
downgrade langchain
YishaiGlasner Jan 6, 2026
a0c853e
langchain versions
YishaiGlasner Jan 6, 2026
0fe819c
langchain versions
YishaiGlasner Jan 6, 2026
6d5c9e5
langchain versions
YishaiGlasner Jan 6, 2026
5c7cefe
use httpx to prevent ChatOpenAI from getting proxies
YishaiGlasner Jan 6, 2026
1e6247d
fix: right data type
YishaiGlasner Jan 8, 2026
96e4ab6
chore: remove import
YishaiGlasner Jan 8, 2026
d8e4227
docs: fix
YishaiGlasner Jan 8, 2026
1b38a69
docs: fix numbers
YishaiGlasner Jan 8, 2026
af5155e
chore: add spaces
YishaiGlasner Jan 8, 2026
c16caa5
us Client with a "with"
YishaiGlasner Jan 8, 2026
74e1f6f
chore: space
YishaiGlasner Jan 8, 2026
4c146af
chore: remove double comment
YishaiGlasner Jan 8, 2026
c02503a
chore: spaces
YishaiGlasner Jan 8, 2026
ec61429
temprarily raise error to kill tasks
YishaiGlasner Jan 8, 2026
0c3bbe9
revert
YishaiGlasner Jan 8, 2026
4b6c693
fix: remove mistaken line from prompt
YishaiGlasner Jan 8, 2026
1e1b380
Merge branch 'main' into add-sheet-scoring
YishaiGlasner Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion app/celery_setup/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@

app = Celery('llm')
app.conf.update(**generate_config_from_env())
app.autodiscover_tasks(packages=['topic_prompt'])
app.autodiscover_tasks(packages=['topic_prompt', 'sheet_scoring'])
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from sefaria_llm_interface.sheet_scoring.sheet_scoring_input import *
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *

Comment on lines +1 to +3
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pollutes the enclosing namespace, as the imported module sefaria_llm_interface.sheet_scoring.sheet_scoring_input does not define 'all'.

Suggested change
from sefaria_llm_interface.sheet_scoring.sheet_scoring_input import *
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *
import sefaria_llm_interface.sheet_scoring.sheet_scoring_input as sheet_scoring_input
import sefaria_llm_interface.sheet_scoring.sheet_scoring_output as sheet_scoring_output
__all__ = []
# Re-export all public (non-underscore) names from the submodules
for _mod in (sheet_scoring_input, sheet_scoring_output):
for _name in dir(_mod):
if not _name.startswith("_"):
globals()[_name] = getattr(_mod, _name)
__all__.append(_name)

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +3
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pollutes the enclosing namespace, as the imported module sefaria_llm_interface.sheet_scoring.sheet_scoring_output does not define 'all'.

Suggested change
from sefaria_llm_interface.sheet_scoring.sheet_scoring_output import *
import sefaria_llm_interface.sheet_scoring.sheet_scoring_output as _sheet_scoring_output
# Re-export all public names from sheet_scoring_output without using a wildcard import.
for _name, _value in _sheet_scoring_output.__dict__.items():
if not _name.startswith("_"):
globals()[_name] = _value

Copilot uses AI. Check for mistakes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from dataclasses import dataclass
from typing import List, Dict, Union


@dataclass
class SheetScoringInput:
# str version of id
sheet_id: str
title: str
sources: List[Dict[str, Union[str, Dict[str, str]]]]
expanded_refs: List[str]

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from dataclasses import dataclass
from typing import Dict
from datetime import datetime


@dataclass
class SheetScoringOutput:
sheet_id: str
processed_datetime: str
language: str
title_interest_level: int
title_interest_reason: str
creativity_score: float
ref_levels: Dict[str, int]
ref_scores: Dict[str, float]
request_status: int
request_status_message: str

def __post_init__(self):
if isinstance(self.processed_datetime, datetime):
self.processed_datetime = self.processed_datetime.isoformat()
9 changes: 6 additions & 3 deletions app/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
langchain[llms]~=0.2.1
langchain==0.2.1
langchain-core==0.2.2
langchain-openai==0.1.8
Comment on lines +1 to +3
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency constraint change: Changed from flexible version langchain[llms]~=0.2.1 to exact version langchain==0.2.1 and split into separate packages (langchain-core, langchain-openai). While this provides more control, ensure this exact version pinning doesn't cause conflicts with other dependencies. The split into specific packages is good practice for reducing unnecessary dependencies.

Suggested change
langchain==0.2.1
langchain-core==0.2.2
langchain-openai==0.1.8
langchain~=0.2.1
langchain-core~=0.2.2
langchain-openai~=0.1.8

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making versions work took me a day...

langsmith~=0.1.0
anthropic~=0.26.1
stanza~=1.5.0
openai~=1.30.0
httpx~=0.27.0
typer~=0.4.1
pydantic~=2.7.1
loguru~=0.7.2
tqdm~=4.66.1
celery[redis]~=5.2.7
diff-match-patch
dnspython~=2.5.0
tiktoken~=0.4.0
tiktoken
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tiktoken dependency is now specified without a version, which means builds and runtime will pull the latest mutable release and any compromised or malicious update could execute arbitrary code with the application's privileges (including access to secrets and data). This creates a concrete supply chain risk because an attacker who gains control of the PyPI project or its release process can silently introduce a backdoored version that will be automatically deployed. Pin tiktoken to a specific, vetted version (and ideally manage upgrades explicitly) to ensure only known-good code is installed.

Copilot uses AI. Check for mistakes.
readability_lxml
tenacity==8.3.0
requests
numpy
git+https://github.com/Sefaria/LLM@v1.0.3#egg=sefaria_llm_interface&subdirectory=app/llm_interface
git+https://github.com/Sefaria/LLM@v1.3.6#egg=sefaria_llm_interface&subdirectory=app/llm_interface
231 changes: 231 additions & 0 deletions app/sheet_scoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# SheetScorer - Jewish Study Sheet Analysis Tool

**SheetScorer** is a Python tool that uses **LLMs** to automatically analyze
and score Jewish study sheets for reference relevance and title interest.
It processes sheets, evaluates how well each cited reference
is discussed, and assigns engagement scores to sheet titles.

## Scores Extracted

- **Reference Discussion Scoring**: Analyzes how thoroughly each reference is discussed (**0-4 scale**)
- **Title Interest Scoring**: Evaluates how engaging sheet titles are to potential readers (**0-4 scale**)
- **Creativity Assessment**: Computes creativity scores based on percentage of **user-generated content**.
- **Title Interest Reason**: Explanation of title scoring.
- **Language**: Language of the sheet [all the languages are supported not only he and en].

## Quick Start

```python
from sheet_scoring.sheet_scoring import score_one_sheet
from sefaria_llm_interface.sheet_scoring import SheetScoringInput

input_data = SheetScoringInput(
sheet_id="123",
title="Understanding Genesis Creation",
expanded_refs=["Genesis 1:1", "Genesis 1:2"],
sources=[
{"outsideText": "This commentary explores..."},
{"ref": "Genesis 1:1", "text": {"en": "In the beginning..."}, "comment": "Analysis here..."}
]
)

result = score_one_sheet(input_data)
print(f"Title score: {result.title_interest_level}")
print(f"Ref scores: {result.ref_scores}")
print(result)
```

## Scoring System

### Architecture

#### sheet_scoring (package)
- sheet_scoring.py - Main API with score_one_sheet() function
- tasks.py - Celery task wrapper for async processing
- text_utils.py - Content parsing and token counting utilities
- openai_sheets_scorer.py - Core LLM scoring engine
- README.md

### Reference Discussion Levels

The tool evaluates how well each reference is discussed using a **0-4 scale**:

| Level | Description |
|-------|-------------|
| **0 - Not Discussed** | Reference is **quoted only**, no discussion or commentary |
| **1 - Minimal** | Mentioned only through **neighboring verses**, minimal engagement |
| **2 - Moderate** | Some discussion present with **basic commentary** |
| **3 - Significant** | **Substantial discussion** with detailed commentary |
| **4 - Central** | Reference is a **central focus** of the entire sheet |

### Title Interest Levels

Sheet titles are scored for **user engagement** on a **0-4 scale**:

| Level | Description |
|-------|-------------|
| **0 - Not Interesting** | **Off-topic** or unengaging for target users |
| **1 - Slight Relevance** | **Low appeal**, users unlikely to engage |
| **2 - Somewhat Interesting** | Users might **skim**, moderate appeal |
| **3 - Interesting** | Users **likely to open** and read |
| **4 - Very Compelling** | **Must-read content**, high engagement expected |

### Creativity Score

user_tokens / total_tokens - Higher = more original content vs canonical quotes.

### Language
ISO-639-1 language code of the sheet, and in case the sheet has no user generated content, the language code of the title.

## Data Structures
#### Input (SheetScoringInput)

```python
{
"sheet_id": "123",
"title": "Sheet title",
"expanded_refs": ["Genesis 1:1", "Exodus 2:3"],
"sources": [
{"outsideText": "User commentary"},
{"outsideBiText": {"en": "English", "he": "Hebrew"}},
{"ref": "Genesis 1:1", "text": {"en": "Quote"}, "comment": "Analysis"}
]
}
```
#### Output (SheetScoringOutput)
```python
{
"sheet_id": "123",
"ref_levels": {"Genesis 1:1": 3, "Exodus 2:3": 2}, # Raw 0-4 scores
"ref_scores": {"Genesis 1:1": 60.0, "Exodus 2:3": 40.0}, # Normalized %
"title_interest_level": 3,
"title_interest_reason": "Compelling theological question",
"language": "en",
"creativity_score": 0.75,
"processed_datetime": "2025-01-31T10:30:00Z",
"request_status": 1, # 1=success, 0=failure
"request_status_message": ""
}
```

## Configuration Options

### Initialization Parameters

```python
with SheetScorer(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini", # Default model
max_prompt_tokens=128000, # Input token budget
token_margin=16384, # Reserved for output
max_ref_to_process=800, # Max num of refs that can be processed
chunk_size=80 # Refs per LLM call
) as scorer:
result = scorer.process_sheet_by_content(...)
```

The constants DEFAULT_MAX_OUTPUT_TOKENS, DEFAULT_MAX_INPUT_OUTPUT_TOKENS are model specific
and can be found on the internet.

## Content Processing Strategy

The tool uses an **adjustable approach** for canonical quotations:

1. **Always includes** all user commentary and **original content**
2. **Conditionally includes** canonical quotes only if the **entire bundle** fits within token limits
and **add_full_commentary is set to True**
3. **Truncates intelligently** using **LLM summarization** when content exceeds limits

1. ***LLM Summarization***: Uses secondary LLM to compress content while preserving key information
2. ***Reference Preservation***: Maintains all biblical reference tags during compression
3. ***Character Fallback***: Falls back to character-based truncation if summarization fails
## Grading Strategy
Processed content is sent to LLM, together with references for grading:

### Resilient Grading List Processing

- **Chunking**: Large reference lists are processed in **chunks** to stay within model limits
- **Overlap Handling**: Smart overlap between chunks prevents **reference boundary issues**

### Resilient Reference Grading

- **Primary attempt**: Process **all references together**
- **Fallback**: Split reference list in **half** and process **recursively**
- **Final fallback**: Assign **default score of 0** to problematic references


### Resilient score extraction

Uses **OpenAI's function calling** feature with **strict schemas**:

#### Middle Chunk Scoring Schema
```python
{
"name": "score_references",
"parameters": {
"ref_levels": {
"Genesis 1:1": {"type": "integer", "minimum": 0, "maximum": 4},
# ... for each reference
}
}
}
```

#### Title Scoring Schema
```python
{
"name": "score_title",
"parameters": {
"language": {"type": "string"},
"title_interest_level": {"type": "integer", "minimum": 0, "maximum": 4},
"title_interest_reason": {"type": "string", "maxLength": 100}
}
}
```


## Database Integration

Designed for **MongoDB integration** with expected document structure:

```python
{
"id": "unique id",
"title": "Sheet Title",
"expandedRefs": ["Genesis 1:1", "Exodus 2:3"],
# Additional sheet content fields...
}
```

## Output Fields

| Field | Description |
|------------------------------|------------------------------------------------|
| **`ref_levels`** | Raw **0-4 scores** for each reference |
| **`ref_scores`** | **Normalized percentage scores** (sum to 100%) |
| **`title_interest_level`** | Title **engagement score** (0-4) |
| **`title_interest_reason`** | **Brief explanation** of title score |
| **`language`** | **Detected language code** |
| **`creativity_score`** | **Percentage** of user-generated content |
| **`processed_datetime`** | **Processing timestamp** |
| **`request_status`** | **Whether scoring succeded/failed** |
| **`request_status_message`** | **The reason why scoring failed** |




## Logging

**Comprehensive logging** for monitoring and debugging:

- **Info**: Processing decisions and **content statistics**
- **Warning**: **Score validation** and fallback usage
- **Error**: **LLM failures** and processing errors

Configure logging level as needed:
```python
import logging
logging.getLogger('sheet_scorer').setLevel(logging.INFO)
```


Empty file added app/sheet_scoring/__init__.py
Empty file.
Loading
Loading