Fix/issue 3214 obligation formatting #3541
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes issue #3214 by preventing unwanted formatting such as HTML entities (") and tab characters (\t) from being stored in CouchDB while saving Obligations with nested text nodes. This ensures clean data storage without changing existing business logic or UI behavior.
Root Cause
Tab characters (\t) were used for indentation in the buildObligationText() method, which resulted in control characters being persisted in CouchDB.
HTML entities (") were introduced because XssStringDeserializer encoded quotation marks before database persistence.
Changes
Backend (licenses-core)
File: LicenseDatabaseHandler.java
Replaced tab characters (\t) with two spaces in buildObligationText() to avoid persisting formatting artifacts.
REST Layer (rest-common)
File: XssStringDeserializer.java
Added missing import for StringEscapeUtils.
Introduced sanitizeWithoutEncoding() to remove XSS patterns without HTML entity encoding.
Fixed regex escaping issues.
Improved security by using case-insensitive pattern matching.
Testing
Manual testing done by checking CouchDB entries directly.
Verified that quotes are stored as " instead of ".
Confirmed that no tab characters are persisted.
XSS protection remains intact (script tags and dangerous patterns are removed).
Full integration testing will be handled by CI/CD.
Security
XSS protection is preserved using pattern-based sanitization rather than encoding.
Potentially dangerous inputs such as script tags, iframes, and inline event handlers are still blocked.
Clean text is stored in the database, with encoding deferred to rendering where applicable.
Notes
No API changes.
No UI changes.
Fix is minimal and backward compatible.