Desktop: Fixes #12877: Viewer word count includes CSS-hidden text causing editor/viewer count mismatch#14893
Desktop: Fixes #12877: Viewer word count includes CSS-hidden text causing editor/viewer count mismatch#14893trueharmonyalan wants to merge 2 commits intolaurent22:devfrom
Conversation
… text causing editor/viewer count mismatch
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
📝 WalkthroughWalkthroughThe changes implement detection and exclusion of hidden HTML elements from plain-text output in the HTML stripping utility. A new helper function identifies CSS styles that hide content, and the HTML parsing logic now tracks nested hidden elements via a counter and skips their text emission. Changes
Suggested labels
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/renderer/htmlUtils.test.ts (1)
141-145: Add a sibling regression case for hidden-parent contentPlease add a case where a hidden parent contains multiple children, so the suite catches hidden-depth desynchronisation regressions.
Suggested test addition
// nested hidden — child inside hidden parent should also be excluded [ '<div style="display: none"><span>still hidden</span></div>', '', ], + // hidden parent with multiple children should exclude all descendants + [ + '<div style="display: none"><span>hidden one</span><span>hidden two</span></div>', + '', + ],As per coding guidelines, focus on testing essential behaviour and edge cases — avoid adding tests for every minor detail.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/renderer/htmlUtils.test.ts` around lines 141 - 145, Add a sibling regression test to the existing "nested hidden — child inside hidden parent should also be excluded" case by replacing the single-child HTML sample with one where the hidden parent contains multiple children (e.g. '<div style="display: none"><span>still hidden</span><p>also hidden</p></div>') and assert the expected output remains '' so the suite catches hidden-depth desynchronisation; update the test array entry in htmlUtils.test.ts accordingly (keep the same test description and expected value).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/renderer/htmlUtils.ts`:
- Around line 174-179: The hidden-depth accounting currently increments
hiddenDepth in onopentag but decrements it for every close tag, causing hidden
state to end too early; modify the logic so onopentag records whether the pushed
tag was hidden (e.g., push a boolean or push the tag name alongside hidden flag
using tagStack) and then in onclosetag pop that same record and only decrement
hiddenDepth if the popped value indicates it was hidden; update references to
tagStack, hiddenDepth, isHiddenByStyle, onopentag and onclosetag so they use
this paired push/pop approach to keep hidden tracking correct for nested
children.
- Around line 62-76: The visibility checks in the conditional that inspects the
style string variable `s` are too permissive (using s.includes) and will match
partial values like `opacity:0.5` or `height:0.5px`; update those checks in
packages/renderer/htmlUtils.ts to use precise pattern matching or parsed CSS
values instead of includes: replace simple s.includes checks for `opacity`,
`height`, `width`, `max-height`, and `text-indent` with regular expressions or a
tiny parser that require exact zero values (e.g. `opacity:\s*0(?:;|$|\s)`,
`(?:height|width|max-height):\s*0(?:px|em|rem|%|)?(?:;|$|\s)`) and require units
for text-indent negatives (e.g. `text-indent:\s*-9999(?:px)?(?:;|$)`) and
similarly tighten checks for `color:transparent`, `display:none`,
`visibility:hidden`, `overflow:hidden && (height|width|max-height === 0)`,
`clip-path:inset(100%)`, and `transform:scale(0)` so they match whole
declarations only; this ensures visible fractional values aren’t misclassified
while preserving the intent of the original condition.
---
Nitpick comments:
In `@packages/renderer/htmlUtils.test.ts`:
- Around line 141-145: Add a sibling regression test to the existing "nested
hidden — child inside hidden parent should also be excluded" case by replacing
the single-child HTML sample with one where the hidden parent contains multiple
children (e.g. '<div style="display: none"><span>still hidden</span><p>also
hidden</p></div>') and assert the expected output remains '' so the suite
catches hidden-depth desynchronisation; update the test array entry in
htmlUtils.test.ts accordingly (keep the same test description and expected
value).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 225ef36c-5905-440a-b0cf-97a0a09e719c
📒 Files selected for processing (2)
packages/renderer/htmlUtils.test.tspackages/renderer/htmlUtils.ts
packages/renderer/htmlUtils.ts
Outdated
| return ( | ||
| s.includes('visibility:hidden') || | ||
| s.includes('display:none') || | ||
| s.includes('opacity:0') || | ||
| s.includes('font-size:0') || | ||
| s.includes('color:transparent') || | ||
| s.includes('color:rgba(0,0,0,0)') || | ||
| // Moves element far off-screen | ||
| s.includes('text-indent:-9999') || | ||
| s.includes('text-indent:-999') || | ||
| // Zero dimensions with hidden overflow | ||
| (s.includes('overflow:hidden') && (s.includes('height:0') || s.includes('width:0') || s.includes('max-height:0'))) || | ||
| // Clip path fully hidden | ||
| s.includes('clip-path:inset(100%)') || | ||
| s.includes('transform:scale(0)') |
There was a problem hiding this comment.
Style matching is overly permissive and drops visible text
The includes(...) checks can misclassify visible content as hidden. Example: Line 65 matches opacity:0.5; Line 73 matches height:0.5px. That will undercount words in visible content.
Proposed fix
const isHiddenByStyle = (style: string): boolean => {
if (!style) return false;
- const s = style.replace(/\s/g, '').toLowerCase();
+ const s = style.replace(/\s/g, '').toLowerCase();
+ const has = (re: RegExp) => re.test(s);
return (
- s.includes('visibility:hidden') ||
- s.includes('display:none') ||
- s.includes('opacity:0') ||
- s.includes('font-size:0') ||
- s.includes('color:transparent') ||
- s.includes('color:rgba(0,0,0,0)') ||
- // Moves element far off-screen
- s.includes('text-indent:-9999') ||
- s.includes('text-indent:-999') ||
- // Zero dimensions with hidden overflow
- (s.includes('overflow:hidden') && (s.includes('height:0') || s.includes('width:0') || s.includes('max-height:0'))) ||
- // Clip path fully hidden
- s.includes('clip-path:inset(100%)') ||
- s.includes('transform:scale(0)')
+ has(/(^|;)visibility:hidden(?:!important)?(;|$)/) ||
+ has(/(^|;)display:none(?:!important)?(;|$)/) ||
+ has(/(^|;)opacity:0(?:!important)?(;|$)/) ||
+ has(/(^|;)font-size:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) ||
+ has(/(^|;)color:transparent(?:!important)?(;|$)/) ||
+ has(/(^|;)color:rgba\(0,0,0,0\)(?:!important)?(;|$)/) ||
+ has(/(^|;)text-indent:-9999(?:px)?(?:!important)?(;|$)/) ||
+ has(/(^|;)text-indent:-999(?:px)?(?:!important)?(;|$)/) ||
+ (
+ has(/(^|;)overflow:hidden(?:!important)?(;|$)/) &&
+ (
+ has(/(^|;)height:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) ||
+ has(/(^|;)width:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) ||
+ has(/(^|;)max-height:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/)
+ )
+ ) ||
+ has(/(^|;)clip-path:inset\(100%\)(?:!important)?(;|$)/) ||
+ has(/(^|;)transform:scale\(0\)(?:!important)?(;|$)/)
);
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return ( | |
| s.includes('visibility:hidden') || | |
| s.includes('display:none') || | |
| s.includes('opacity:0') || | |
| s.includes('font-size:0') || | |
| s.includes('color:transparent') || | |
| s.includes('color:rgba(0,0,0,0)') || | |
| // Moves element far off-screen | |
| s.includes('text-indent:-9999') || | |
| s.includes('text-indent:-999') || | |
| // Zero dimensions with hidden overflow | |
| (s.includes('overflow:hidden') && (s.includes('height:0') || s.includes('width:0') || s.includes('max-height:0'))) || | |
| // Clip path fully hidden | |
| s.includes('clip-path:inset(100%)') || | |
| s.includes('transform:scale(0)') | |
| const has = (re: RegExp) => re.test(s); | |
| return ( | |
| has(/(^|;)visibility:hidden(?:!important)?(;|$)/) || | |
| has(/(^|;)display:none(?:!important)?(;|$)/) || | |
| has(/(^|;)opacity:0(?:!important)?(;|$)/) || | |
| has(/(^|;)font-size:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) || | |
| has(/(^|;)color:transparent(?:!important)?(;|$)/) || | |
| has(/(^|;)color:rgba\(0,0,0,0\)(?:!important)?(;|$)/) || | |
| has(/(^|;)text-indent:-9999(?:px)?(?:!important)?(;|$)/) || | |
| has(/(^|;)text-indent:-999(?:px)?(?:!important)?(;|$)/) || | |
| ( | |
| has(/(^|;)overflow:hidden(?:!important)?(;|$)/) && | |
| ( | |
| has(/(^|;)height:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) || | |
| has(/(^|;)width:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) || | |
| has(/(^|;)max-height:0(?:px|em|rem|%|pt|vw|vh)?(?:!important)?(;|$)/) | |
| ) | |
| ) || | |
| has(/(^|;)clip-path:inset\(100%\)(?:!important)?(;|$)/) || | |
| has(/(^|;)transform:scale\(0\)(?:!important)?(;|$)/) | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/renderer/htmlUtils.ts` around lines 62 - 76, The visibility checks
in the conditional that inspects the style string variable `s` are too
permissive (using s.includes) and will match partial values like `opacity:0.5`
or `height:0.5px`; update those checks in packages/renderer/htmlUtils.ts to use
precise pattern matching or parsed CSS values instead of includes: replace
simple s.includes checks for `opacity`, `height`, `width`, `max-height`, and
`text-indent` with regular expressions or a tiny parser that require exact zero
values (e.g. `opacity:\s*0(?:;|$|\s)`,
`(?:height|width|max-height):\s*0(?:px|em|rem|%|)?(?:;|$|\s)`) and require units
for text-indent negatives (e.g. `text-indent:\s*-9999(?:px)?(?:;|$)`) and
similarly tighten checks for `color:transparent`, `display:none`,
`visibility:hidden`, `overflow:hidden && (height|width|max-height === 0)`,
`clip-path:inset(100%)`, and `transform:scale(0)` so they match whole
declarations only; this ensures visible fractional values aren’t misclassified
while preserving the intent of the original condition.
| onopentag: (name: string, attrs: Record<string, string>) => { | ||
| tagStack.push(name.toLowerCase()); | ||
| if (isHiddenByStyle(attrs['style'])) { | ||
| hiddenDepth++; | ||
| } | ||
| }, |
There was a problem hiding this comment.
Hidden-depth bookkeeping is incorrect for nested non-hidden children
Line 189 decrements hiddenDepth for every close tag, not just tags that were hidden on open. This can re-enable text collection too early inside a hidden parent (e.g., hidden <div> with multiple child elements), so hidden words leak into the count.
Proposed fix
- let hiddenDepth = 0;
+ let hiddenDepth = 0;
+ const hiddenTagStack: boolean[] = [];
@@
onopentag: (name: string, attrs: Record<string, string>) => {
tagStack.push(name.toLowerCase());
- if (isHiddenByStyle(attrs['style'])) {
- hiddenDepth++;
- }
+ const isHidden = isHiddenByStyle(attrs['style']);
+ hiddenTagStack.push(isHidden);
+ if (isHidden) hiddenDepth++;
},
@@
onclosetag: (name: string) => {
- if (currentTag() === name.toLowerCase()) tagStack.pop();
- if (hiddenDepth > 0) hiddenDepth--;
+ if (currentTag() === name.toLowerCase()) {
+ tagStack.pop();
+ const wasHidden = hiddenTagStack.pop() ?? false;
+ if (wasHidden) hiddenDepth--;
+ }
},Also applies to: 187-190
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/renderer/htmlUtils.ts` around lines 174 - 179, The hidden-depth
accounting currently increments hiddenDepth in onopentag but decrements it for
every close tag, causing hidden state to end too early; modify the logic so
onopentag records whether the pushed tag was hidden (e.g., push a boolean or
push the tag name alongside hidden flag using tagStack) and then in onclosetag
pop that same record and only decrement hiddenDepth if the popped value
indicates it was hidden; update references to tagStack, hiddenDepth,
isHiddenByStyle, onopentag and onclosetag so they use this paired push/pop
approach to keep hidden tracking correct for nested children.
|
CodeRabbit's comments need to be adressed |
sure, working on it. |
This PR fixes #12877.
When a note contains HTML elements hidden via CSS (e.g.
visibility: hidden,display: none), the Viewer word count was incorrectly including that hidden text in its total — causing a mismatch with what the user actually sees in the preview.The Editor count is intentionally unaffected, as it reflects everything the user has typed regardless of styling.
Fix:
stripHtmlnow detects and skips CSS-hidden elements during textextraction, so the Viewer count only reflects visible, rendered content.
Tests: Added test cases to
htmlUtils.test.tscovering all supportedCSS hiding methods (
visibility: hidden,display: none, etc.),nested hidden elements, and mixed visible/hidden content.
Before
Joplin 3.5.13 (prod, linux)
Device: linux, Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz
Client ID: e3dcf4bbb98c4873b7229cd3a859a4aa
Sync Version: 3
Profile Version: 49
Keychain Supported: No
Alternative instance ID: -
Revision: 0c1511f
Backup: 1.5.1
Freehand Drawing: 4.2.0
issue_vid.mp4
After
Joplin 3.6.6 (dev, linux)
Device: linux, Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz
Client ID: 463bfd1c6f0e487298782bdebf4a6d09
Sync Version: 3
Profile Version: 49
Keychain Supported: No
Alternative instance ID: -
Sync target: (None)
Editor: Markdown
Revision: dfdc0f3 (dev)
Backup: 1.5.1
Freehand Drawing: 4.3.0
issue_fixed_vid.mp4