Skip to content

Deserialization of a doctype with very long content failsΒ #533

@benoitryder

Description

@benoitryder

quick_xml::de::from_reader() parsing fails if the XML contains a doctype with content larger than the internal BufRead capacity.
For instance

<!DOCTYPE [
<!-- A very very long comment *snipped* -->
]>
<X></X>

Here is a minimal code to reproduce this issue. It fails with an ExpectedStart error.

use std::io::Write;
use serde::Deserialize;

#[derive(Deserialize)]
struct X {}

fn main() {
    {
        let mut file = std::fs::File::create("test.xml").unwrap();
        let header = &"<!DOCTYPE X [<!--";
        let footer = &"-->]><X></X>";
        let padding = 8192 - (header.len() + 2);
        write!(file, "{header}{:1$}{footer}", "", padding).unwrap();
    }

    let file = std::fs::File::open("test.xml").unwrap();
    let reader = std::io::BufReader::new(file);
    let _: X = quick_xml::de::from_reader(reader).unwrap();
}

Cargo.toml content

[package]
name = "test"
version = "0.1.0"
edition = "2021"

[dependencies]
quick-xml = { version = "0.27.1", features = ["serialize"] }
serde = { version = "1.0", features = ["derive"] }
  • When decreasing the padding size, or using BufReader::with_capacity() to increase the buffer, even of 1 byte, there is no error.
  • Other BufRead implementations don't have this issue (checked with &[u8] and stdin).
  • Content does not have to be in one "block". The same issue occurs for a doctype split into multiple declarations and comments.
  • With a longer doctype with real content, the error may be different. For instance it may complains about an invalid ! from a !ENTITY tag.
  • No issue with serde-xml-rs, even for larger comments.
  • Tested on Windows, with rustc 1.66.0.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions