Skip to content

Commit cff8ac1

Browse files
mwvoloCopilotCopilot
authored
add content to the page.html template for AI readability (#1657)
* add content to the page.html template for AI readability * properly render richtext blocks * use seo title * Update pages/templatetags/pages_tags.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update pages/templatetags/pages_tags.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add test coverage for get_page_content template tag (#1658) * Initial plan * Add comprehensive test coverage for get_page_content template tag Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> * Improve test quality based on code review feedback Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> * Fix test robustness issues from code review Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> * Remove unused import from test Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> * Replace duck typing with explicit type checking for RichText objects (#1659) * Initial plan * Replace duck typing with explicit isinstance check for RichText Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> * Make dict type handling explicit in extract_content function (#1660) * Initial plan * Make dict handling more explicit in extract_content function - Reorder type checking to handle dicts explicitly before general iterables - Add detailed comments explaining why dicts are checked first - Update docstring to be more explicit about dict vs StructValue handling - Add test for plain Python dict handling - Ensures dict values are processed (not keys as iterables would do) Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> * Fix extra blank line in tests.py Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com> Co-authored-by: Michael Volo <volo@rice.edu> * Remove openstax/settings/test_sqlite.py from .gitignore Remove test_sqlite.py from .gitignore --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: mwvolo <3905516+mwvolo@users.noreply.github.com>
1 parent 5c19680 commit cff8ac1

File tree

4 files changed

+381
-3
lines changed

4 files changed

+381
-3
lines changed

pages/templates/page.html

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
{% load wagtailimages_tags wagtailcore_tags static %}
1+
{% load wagtailimages_tags wagtailcore_tags static pages_tags %}
2+
23

34
{% image page.promote_image original as promote_image %}
45

56
<html>
67
<head>
78

8-
<title>{{ page.title }}</title>
9+
<title>{{ page.seo_title|default:page.title }}</title>
910
<meta name="description" content="{{ page.search_description }}">
1011
<link rel="canonical" href="{{page.get_full_url}}" />
1112
<meta http-equiv="refresh">
@@ -25,6 +26,20 @@
2526
<meta name="twitter:image:alt" content="OpenStax">
2627
</head>
2728
<body>
28-
{% block content %}{% endblock %}
29+
{% block content %}
30+
{% get_page_content page as page_content %}
31+
<div>
32+
<h1>{{ page.seo_title|default:page.title }}</h1>
33+
{% for item in page_content %}
34+
<div class="field-{{ item.name }}">
35+
{% if item.type == 'RichTextField' %}
36+
{{ item.value|richtext }}
37+
{% else %}
38+
{{ item.value|linebreaks }}
39+
{% endif %}
40+
</div>
41+
{% endfor %}
42+
</div>
43+
{% endblock %}
2944
</body>
3045
</html>

pages/templatetags/__init__.py

Whitespace-only changes.

pages/templatetags/pages_tags.py

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
from django import template
2+
from wagtail.fields import StreamField, RichTextField
3+
from wagtail.rich_text import RichText
4+
from django.db.models import CharField, TextField
5+
6+
register = template.Library()
7+
8+
def extract_content(value):
9+
"""
10+
Recursively extract text content from various Wagtail/Django field values.
11+
12+
This helper is used to normalize content from complex structures such as
13+
StreamField and StructBlock values into a plain text string.
14+
15+
Args:
16+
value:
17+
The value to extract text from. This can be:
18+
- Primitive types such as ``str``, ``int``, or ``float``.
19+
- Wagtail ``RichText`` objects (from ``wagtail.rich_text``).
20+
- Iterable values representing StreamField data (e.g. ``StreamValue``,
21+
``ListValue`` or plain lists/tuples). Items that look like
22+
StreamField children (having ``value`` and ``block_type`` attributes)
23+
are unwrapped via their ``.value`` attribute.
24+
- Struct-like values (e.g. Wagtail's ``StructValue``) or plain Python
25+
``dict`` objects, which are processed by extracting their values
26+
via the ``.values()`` method and recursively extracting content
27+
from each value.
28+
- ``None``, which is treated as empty content.
29+
30+
Returns:
31+
str: A string containing the concatenated text content extracted from
32+
``value`` and any nested structures. Returns an empty string if no
33+
textual content is found.
34+
35+
Example:
36+
>>> extract_content("Title")
37+
'Title'
38+
>>> extract_content(None)
39+
''
40+
>>> extract_content([{"foo": "bar"}, "baz"])
41+
'bar\nbaz'
42+
43+
Edge cases:
44+
- Empty iterables or structures yield an empty string.
45+
- Unknown object types are converted to text with ``str(value)``.
46+
"""
47+
if value is None:
48+
return ""
49+
50+
if isinstance(value, (str, int, float)):
51+
return str(value)
52+
53+
# RichText objects
54+
if isinstance(value, RichText):
55+
return value.source
56+
57+
# Handle StructValue and plain dicts before checking for general iterables.
58+
# This ensures dict-like objects are processed via their values() method
59+
# rather than being treated as iterable key sequences.
60+
# Note: Both Wagtail's StructValue and Python's dict have a values() method.
61+
if isinstance(value, dict) or (hasattr(value, 'values') and callable(getattr(value, 'values', None))):
62+
parts = []
63+
for v in value.values():
64+
parts.append(extract_content(v))
65+
return "\n".join(filter(None, parts))
66+
67+
# Handle StreamValue / ListValue (lists, tuples, and other iterables)
68+
# We've already handled dicts above, and we exclude str/bytes here.
69+
if hasattr(value, '__iter__') and not isinstance(value, (str, bytes)):
70+
parts = []
71+
for item in value:
72+
# If it's a StreamChild (has 'value' and 'block_type'), use .value
73+
if hasattr(item, 'value') and hasattr(item, 'block_type'):
74+
parts.append(extract_content(item.value))
75+
else:
76+
parts.append(extract_content(item))
77+
return "\n".join(filter(None, parts))
78+
79+
return str(value)
80+
81+
@register.simple_tag(takes_context=True)
82+
def get_page_content(context, page):
83+
"""
84+
Returns a list of renderable content chunks from the page's fields.
85+
Checks for StreamField, RichTextField, CharField, and TextField.
86+
"""
87+
if not page:
88+
return []
89+
90+
# Access the specific subclass instance to get all defined fields
91+
specific_page = page.specific
92+
content_items = []
93+
94+
# Get all fields from the model
95+
fields = specific_page._meta.get_fields()
96+
97+
for field in fields:
98+
# Skip internal/relation fields usually not content (simple heuristic)
99+
if field.name in ['id', 'path', 'depth', 'numchild', 'live', 'has_unpublished_changes',
100+
'first_published_at', 'last_published_at', 'go_live_at', 'expire_at',
101+
'content_type', 'owner', 'seo_title', 'search_description', 'slug', 'title',
102+
'locked', 'locked_at', 'locked_by', 'latest_revision_created_at',
103+
'live_revision', 'url_path', 'content_type_id', 'owner_id',
104+
'locked_by_id', 'live_revision_id', 'generic_comments', 'wagtail_admin_comments']:
105+
continue
106+
107+
# Check field types we want to render
108+
is_content_field = isinstance(field, (StreamField, RichTextField, CharField, TextField))
109+
110+
if is_content_field:
111+
try:
112+
value = getattr(specific_page, field.name)
113+
if value:
114+
try:
115+
# Extract text content recursively
116+
extracted_text = extract_content(value)
117+
118+
# Determine type for template rendering
119+
# Treat StreamField as RichTextField because extract_content returns raw HTML for RichBlocks
120+
if isinstance(field, (RichTextField, StreamField)):
121+
item_type = 'RichTextField'
122+
else:
123+
item_type = 'Text'
124+
125+
if extracted_text.strip():
126+
content_items.append({
127+
'name': field.name,
128+
'value': extracted_text,
129+
'type': item_type
130+
})
131+
except (TypeError, ValueError) as e:
132+
print(f"Error extracting content from field {field.name}: {e}")
133+
except AttributeError:
134+
pass
135+
136+
return content_items
137+

0 commit comments

Comments
 (0)