-
Notifications
You must be signed in to change notification settings - Fork 8
SelectingNodes
Sometimes you are not interested in the whole of a document but only
want to compare parts of it, for example when your document contains a
lot of boilerplate XML and you are just filling in a small part of
it. You can use a NodeFilter to tell XMLUnit which parts it should
ignore and which to compare.
Once XMLUnit is focused on the interesting parts, it may need help to
pick the correct pairs of XML nodes to compare. The most strict
scenario is one where the trees must be completely identical and the
order of nodes is significant at every level - but there is a
surprisingly big number of use cases where order is completely
irrelevant. NodeMatcher is responsible for telling XMLUnit which
nodes of the two documents it compares need to be matched with each
other.
In order to properly use NodeFilter and NodeMatcher it is crucial
to understand that XMLUnit traverses the document from its root
element to the leaves in a depth-first approach and whenever it
encounters an XML element, it consults NodeFilter to prune the child
nodes that are not interesting and NodeMatcher to pick the branches
of the two XML documents that should get compared. Once a branch has
been chosen, there is no going back.
For example, assume a control document of
<table>
<tbody>
<tr>
<th>some key</th>
<td>some value</td>
</tr>
<tr>
<th>another key</th>
<td>another value</td>
</tr>
</tbody>
</table>and a test document of
<table>
<tbody>
<tr>
<th>another key</th>
<td>another value</td>
</tr>
<tr>
<th>some key</th>
<td>some value</td>
</tr>
</tbody>
</table>If your requirement is to ignore the order of <tr>s but identify
matching rows based on the textual content of the <th> nodes, then
NodeMatcher must already select the "correct" <tr> elements when
it gets passed in the children of <tbody>. Once XMLUnit is set on
the <tr> branches, there is no way to match nodes from one branch to
those of another one.
This is you can't just say "match elements based on their name and
textual content" because any two <tr>s have the same element name
and the same textual content - none at all if ignoring element content
whitespace. Therefore XMLUnit would simply match the <tr>s in
document order an not select the rows the way you want them to be
selected.
So when deciding what to prune in NodeFilter and in particular which
parts to match in NodeMatcher you have to follow your structure
towards the root of the document tree and find the common ancestor
that needs to make the right decision for the order of branches you
need.
NodeFilter isn't an interface of its own right but just a
Predicate<(Xml)Node> functional interface or delegate.
When XMLUnit visits an element, it will invoke the configured
NodeFilter for each of the child nodes and ignore all nodes where
the filter returned false.
By default - if no NodeFilter has been configured at all - all child
nodes are part of the comparison process.
As of XMLUnit 2.0.0 there is no public built-in implementation of
NodeFilter.
(I)NodeMatcher searches the nodes which should be compared from the
list of test- and control-nodes. It is invoked with the children of
the current elements of the control and test documents and returns the
matching pairs of nodes. Any node not returned as part of a matching
pair is considered "unmatched" and will result in a failed
CHILD_LOOKUP comparison.
Usually you won't implement (I)NodeMatcher itself but rather use the
default implementation DefaultNodeMatcher and configure it to you
needs.
The DefaultNodeMatcher implementation delegates the decision for
each node to the ElementSelector and NodeTypeMatcher
implementations passed in as arguments to its constructor.
-
ElementSelector: is used for all nodes of type
(Xml)Element. The default implementation always returns true which makes XMLUnit compare all elements in document order. -
NodeTypeMatcher: is used for any other nodes that are not
(Xml)Elements. The default implementation matches nodes by their node type with one exception,CDATAandText-nodes are considered the same kind of node.
ElementSelector receives a single element node from the control and
the test document and decides, whether those two elements should be
compared with each other by XMLUnit. DefaultNodeMatcher will try to
match each control element with each test element that hasn't been
matched already trying to stay in document order.
For example, when comparing
<root>
<a/>
<b/>
<c/>
<d/>
</root>with
<root>
<d/>
<a/>
<e/>
<b/>
</root>Assuming the configured ElementSelector would return true if the
element names matched. DefaultNodeMatcher would invoke
ElementSelector with the following pairs (the first one from the
control, the second from the test document):
a | d |
a | a | => matching pair found
b | e | tries to keep element order, so doesn't start over again
b | b | => matching pair found
c | d | hit end of list, start from the front
c | e | list exhausted, no match for c at all
d | d | hit end of list, start from the front => match
It is possible to configure DefaultNodeMatcher to use more than one
ElementSelector when matching elements. If you do so,
DefaultNodeMatcher will first try to find a matching test node for a
given control node by consulting the first ElementSelector. If it
didn't find any match it uses the second ElementSelector and so on.
ElementSelector is most likely the part that needs to get customized
most often since the exact logic of matching branches with each other
is very specific to each single use case.
XMLUnit comes with a several ElementSelector implementations that
are available as static members of the ElementSelectors class.
This is the ElementSelector used by DefaultNodeMatcher if no
ElementSelector has been configured explicitly. It simply matches
elements in document order, i.e. the first child element of any given
control element is compared to the first child element of any given
test element, the second to the second and so on.
Actually document order is ensured by DefaultNodeMatcher itself,
this ElementSelector simply always returns true.
This implementation doesn't care about element names at all.
Two elements are matched if their qualified name - i.e. the local name
and the namespace URI (if any) are the same. It doesn't care for
namespace prefixes at all, neither does any of the other built-in
ElementSelectors.
Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and their textual content matches.
Example:
Control XML:
<flowers>
<flower>Roses</flower>
<flower>Daisy</flower>
<flower>Crocus</flower>
</flowers>Test XML:
<flowers>
<flower>Daisy</flower>
<flower>Roses</flower>
<flower>Crocus</flower>
</flowers>Without custom ElementSelector you will get a difference "Expected
text value 'Roses' but was 'Daisy' ... ".
With a custom ElementSelectors.byNameAndText you can ensure the
"right" nodes are compared with each others:
String controlXml = "<flowers><flower>Roses</flower><flower>Daisy</flower><flower>Crocus</flower></flowers>";
String testXml = "<flowers><flower>Daisy</flower><flower>Roses</flower><flower>Crocus</flower></flowers>";
Diff myDiff = DiffBuilder.compare(controlXml).withTest(testXml)
.checkForSimilar() // a different order is always 'similar' not equals.
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
.build();
Assert.assertFalse("XML similar " + myDiff.toString(), myDiff.hasDifferences());for Java, or for .NET:
string controlXml = "<flowers><flower>Roses</flower><flower>Daisy</flower><flower>Crocus</flower></flowers>";
string testXml = "<flowers><flower>Daisy</flower><flower>Roses</flower><flower>Crocus</flower></flowers>";
var myDiff = DiffBuilder.Compare(controlXml).WithTest(testXml)
.CheckForSimilar() // a different order is always 'similar' not equals.
.WithNodeMatcher(new DefaultNodeMatcher(ElementSelectors.ByNameAndText))
.Build();
Assert.IsFalse(myDiff.hasDifferences(), "XML similar " + myDiff.toString());Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and all attributes (as identified by their local name and namespace URI) have the same value.
Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and all attributes who's names have been given as parameters have the same value.
There are two overloads of ElementSelectors.byNameAndAttributes, on
accepts Strings and one QNames or XmlQualifiedNames. The
string-arg version only considers attributes in the null-namespace
(i.e. those with only a local name and no associated namespace URI).
Is a variant of ElementSelectors.byNameAndAttributes where attribute
local names are given as strings and the namespace URI is expected to
be the one defined for the attribute on the control element - this
only works properly if the local names of the attributes are unique
for the given elements.
Expects an XPath expression yielding elements (where the XPath context
"." is the current control or test element) and another
ElementSelector as arguments. An additional overload allows you to
provide the namespace context for the XPath expression.
When comparing to elements, the XPath expression is applied to the
test and control elements and the resulting node lists are compared to
each other using the given ElementSelector. The control and test
elements match, if the given ElementSelector finds matching pairs
for all node lists returned by the XPath expression.
This is a (partial) option for a case like the <table> example from
the beginning of this chapter.
ElementSelectors.byXPath(".//th", ElementSelectors.byNameAndText)
would match the "correct" <tr>s to each other. It is only a partial
solution since it also works for <th> and <td> only by accident
(the node lists are empty, so they match trivially) and blindly using
byXPath in more complex scenarios is likely to fail.
These are combiners for other ElementSelectors, where not negates
an ElementSelector, or returns true if any of the given selectors
does, all returns true if all of the given selectors would do and
xor returns true if one of the two given selectors returns true and
the other one returns false. To be honest xor is only there for
completeness, so far we haven't seen a usecase for it.
There is an important difference between ElementSelectors.or and
passing several ElementSelectors to the constructor of
DefaultNodeMatcher. or will apply all ElementSelectors to each
pair of elements immediately, while DefaultNodeMatcher tries all
control elements for the first ElementSelector before consulting the
second.
Example
Assuming
<root>
<a>x</a>
<b/>
<a>y</a>
</root>and
<root>
<a>y</a>
<b>some text</b>
<a>x</a>
</root>and you want to match by element name and nested textual content - but fall back to just the element's name if there is no match including the textual content.
Using DefaultNodeMatcher(ElementSelectors.byNameAndText, ElementSelectors.byName) will match the <a>s with matching textual
content, just as required. Using
ElementSelectors.or(ElementSelectors.byNameAndText, ElementSelectors.byName) the byNameAndText will return false for
the first <a> elements, but byName will return true and so the
"wrong" <a>s get compared to each other.
- Overview
- General Concepts
- Comparing XML
- Validating XML
- Utilities
- Migrating from XMLUnit 1.x to 2.x
- Known Issues