Skip to content

Conversation

borkdude
Copy link
Contributor

@borkdude borkdude commented May 2, 2025

No description provided.

@borkdude borkdude merged commit 0c164f8 into master May 2, 2025
4 checks passed
@borkdude borkdude deleted the jsoup-upgrade branch May 2, 2025 09:45
@danielcompton
Copy link
Member

@borkdude note that jsoup 1.20.1 introduces some breaking changes around self-closing tags: jhy/jsoup#2322, https://github.com/jhy/jsoup/releases/tag/jsoup-1.20.1.

Possibly it might be good to add a parse-xml/parse-xml-fragment function to hickory.core to match the API for parse/parse-fragment for those who are using Hickory to parse SVGs (like we were).

In the meantime, I've created this function for our own usage:

(defn parse-xml-fragment
  ;; Version of hickory/parse-fragment specialised for XML.
  ;; https://github.com/jhy/jsoup/discussions/2322
  "Parse an XML fragment (some group of tags that might be at home somewhere
   in the tag hierarchy) into a list of elements that can
   each be passed as input to as-hiccup or as-hickory."
  [s]
  (into [] (Parser/parseXmlFragment s "https://example.com")))

@borkdude
Copy link
Contributor Author

borkdude commented Aug 5, 2025

@danielcompton Thanks for the note! I didn't notice any breakage since no tests were breaking. Does Jsoup break for HTML with svg elements in them? Or is that only a concern when you're using Jsoup to parse an XML fragment standalone?

@danielcompton
Copy link
Member

danielcompton commented Aug 6, 2025

Here is an example test I wrote with two self-closing tags side-by-side.

  (testing "parse-svg returns consistent hiccup structure"
    (let [svg-input "<rect x=\"10.5\" y=\"2\" width=\"3\" height=\"14\" rx=\"1\" ry=\"1\" fill=\"#444\"/><circle data-color=\"color-2\" cx=\"12\" cy=\"20\" r=\"2\" fill=\"#444\"/>"
          result (parse/parse-svg svg-input)
          expected-result [[:rect {:x "10.5" :y "2" :width "3" :height "14" :rx "1" :ry "1" :fill "#444"}]
                           [:circle {:data-color "color-2" :cx "12" :cy "20" :r "2" :fill "#444"}]]]
      (is (= expected-result result)
          "SVG parsing should produce stable structure across jsoup/hickory upgrades")))

Using the new version of jsoup, the result will have the circle nested inside the rect

[[:rect {:x "10.5" :y "2" :width "3" :height "14" :rx "1" :ry "1" :fill "#444"}
  [:circle {:data-color "color-2" :cx "12" :cy "20" :r "2" :fill "#444"}]]]

I would expect that jsoup would handle SVG inside HTML correctly; the issue is that I'm parsing an SVG/XML fragment as HTML. That previously worked by accident, but was never a valid use-case.

@borkdude
Copy link
Contributor Author

borkdude commented Aug 7, 2025

Yeah it seems wrapping the input in an svg element does the right thing:

user=> (c/as-hiccup (first (c/parse-fragment "<svg><rect x=\"10.5\" y=\"2\" width=\"3\" height=\"14\" rx=\"1\" ry=\"1\" fill=\"#444\"/><circle data-color=\"color-2\" cx=\"12\" cy=\"20\" r=\"2\" fill=\"#444\"/></svg>")))
[:svg {} [:rect {:x "10.5", :y "2", :width "3", :height "14", :rx "1", :ry "1", :fill "#444"}] [:circle {:data-color "color-2", :cx "12", :cy "20", :r "2", :fill "#444"}]]

so maybe that's a good workaround for your use case as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants