Skip to content

add HTML checks to i18n linting#20199

Merged
ornicar merged 1 commit intolichess-org:masterfrom
Ijtihed:html-lint-checks
Apr 8, 2026
Merged

add HTML checks to i18n linting#20199
ornicar merged 1 commit intolichess-org:masterfrom
Ijtihed:html-lint-checks

Conversation

@Ijtihed
Copy link
Copy Markdown
Contributor

@Ijtihed Ijtihed commented Apr 6, 2026

For #20196.

I made it check translation strings for XSS vectors: dangerous tags (script, iframe, svg, etc.) -> error; event handler attributes, style/srcdoc attrs -> error; javascript:/data: hrefs (including percent-encoded bypasses) -> error; http:// -> warning. then allowed tags beyond your a, i, strong: added em, b, br (already in source strings), span (forward compat), kbd, code, samp (per review). I can make it tighter also, please lmk.

I skipped domain allowlisting since strings legitimately link externally and https:// enforcement covers the actual attack surface. It's easy to add in lint_href if we want later.

67 new CI warnings are <ctrl>/<strg>/<enter> keyboard-key pseudo-tags from the preferences string. it's warnings not errors so they don't block. lmk if you'd prefer a carveout or notices.

edit 1: also fixed a gap where double-encoded entities (&amp;#60;) bypassed the early return. html.unescape now always runs before checking for <.

Tested locally:

  • python bin/trans-lint translation/source/site.xml 0 errors
  • python bin/trans-lint translation/source/study.xml 0 errors
  • python bin/trans-lint translation/dest/*/*.xml 0 errors, 363 warnings, exit code 0

@superuser-does
Copy link
Copy Markdown
Collaborator

superuser-does commented Apr 6, 2026

I'd be grateful if you could add the other inline elements I mentioned in #20196 (kbd, code, samp) to the allow-list, as you are already thinking towards future developments.

I am concerned about browsers' quirks mode. Suppose a malicious actor entered &lt; script &gt; - so with a space before or after. Would a browser evaluate that as a full <script> element?

There is also a scenario of constructing it with unicodes (&#nnn;). Hopefully the parser picks this up.

@Ijtihed
Copy link
Copy Markdown
Contributor Author

Ijtihed commented Apr 6, 2026

Added kbd code samp.

I don't think < script > with spaces is a risk. HTMLParser needs a letter right after < to recognize a tag so spaced versions would just get treated as text. Quirks mode doesnt change this.

I thin he &#nnn; thing was a real issue though. Double-encoded entities (&amp;#60;) weren't in the early return. I fixed this by always running html.unescape before checking for <.

@Ijtihed Ijtihed force-pushed the html-lint-checks branch from 41a201b to b07fd66 Compare April 6, 2026 17:33
@ornicar ornicar force-pushed the html-lint-checks branch from b07fd66 to f48b0f0 Compare April 8, 2026 07:22
@ornicar ornicar merged commit 794460a into lichess-org:master Apr 8, 2026
8 checks passed
@superuser-does superuser-does linked an issue Apr 8, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add HTML checks to i18n linting

3 participants