feat(website): add feature to search multiple string fields#5881
feat(website): add feature to search multiple string fields#5881chaoran-chen wants to merge 3 commits intomainfrom
Conversation
|
This PR may be related to #1985 (Search: standardised approach for multi-entry search) |
|
Very cool - but I think this actually doesn't relate to #1985 unless I misunderstand: the idea there is about the type of search where you can enter multiple accessions and you're not trying to find all of them in a single sequence entry, but each one in a different sequence entry |
|
aah, right, I misread, thanks @theosanderson! |
46cd32e to
8fed763
Compare
|
Claude finished @chaoran-chen's task in 3m 9s —— View job Code Review Complete ✓I've reviewed this PR and found it to be a well-implemented feature with comprehensive testing. The implementation is clean and follows existing patterns in the codebase. Below is my detailed assessment: SummaryThis PR adds multi-field search functionality that allows users to search across multiple metadata fields using a single input. Two search fields are configured:
The implementation constructs LAPIS advanced queries using regex patterns to search across multiple fields with OR logic. Strengths✓ Good test coverage: Comprehensive Playwright tests covering search, URL params, filter removal, and downloads Issues FoundI've left inline comments on specific areas that need attention: Security (1 issue):
Code Quality (3 issues):
Testing (3 issues):
Documentation (2 issues):
Accessibility (1 issue):
RecommendationsMust address before merge:
Should address: Nice to have: Testing NotesThe PR includes thorough integration tests that verify:
Manual testing was performed on the preview environment for WNV searches. |
website/src/components/SearchPage/DownloadDialog/SequenceFilters.tsx
Outdated
Show resolved
Hide resolved
integration-tests/tests/specs/features/search/multi-field-search.dependent.spec.ts
Outdated
Show resolved
Hide resolved
integration-tests/tests/specs/features/search/multi-field-search.dependent.spec.ts
Outdated
Show resolved
Hide resolved
integration-tests/tests/specs/features/search/multi-field-search.dependent.spec.ts
Outdated
Show resolved
Hide resolved
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: affc28b6cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (value && typeof value === 'string' && value.trim()) { | ||
| const regex = makeCaseInsensitiveLiteralSubstringRegex(value.trim()); | ||
| const fieldQueries = mfs.fields.map((f) => `${f}.regex='${regex}'`); |
There was a problem hiding this comment.
Escape quotes in advancedQuery regex literals
The new multi-field search builds advancedQuery by interpolating the user’s raw input into a single-quoted string (${f}.regex='${regex}'). makeCaseInsensitiveLiteralSubstringRegex escapes regex metacharacters but does not escape single quotes, so inputs containing apostrophes (e.g., “O'Connor”, “King’s College”) will terminate the string literal and produce a malformed advancedQuery, causing the search to fail or be parsed incorrectly. Please escape ' (or switch to a safer encoding/quoting strategy) before embedding user input in the advanced query string.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
this sounds like it could be important - using O'Connor in the field does indeed crash the page!
There was a problem hiding this comment.
or well not crash but we get the red error bar and lapis returns
"error": {
"type": "about:blank",
"title": "Bad Request",
"status": 400,
"detail": "Failed to parse advanced query (line 1:282): mismatched input 'C' expecting {')', '|', '&', AND, OR}."
},
There was a problem hiding this comment.
That's indeed a very good point, I'm not sure whether we have a way to escape in LAPIS, I'll check.
There was a problem hiding this comment.
I just tested this out by pasting an accession into the identifier search field for CCHf and I think it is sadly broken for multi segmented organisms! The page freezes:


and then I get an axios error
{"message":"Request failed with status code 400","name":"AxiosError","stack":"AxiosError: Request failed with status code 400\n at It (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:3:1073)\n at XMLHttpRequest.w (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:3:5793)\n at _.request (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:5:2079)\n at async Gn.request (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:11:4348)","config":{"transitional":{"silentJSONParsing":true,"forcedJSONParsing":true,"clarifyTimeoutError":false},"adapter":["xhr","http","fetch"],"transformRequest":[null],"transformResponse":[null],"timeout":0,"xsrfCookieName":"XSRF-TOKEN","xsrfHeaderName":"X-XSRF-TOKEN","maxContentLength":-1,"maxBodyLength":-1,"env":{},"headers":{"Accept":"application/json, text/plain, */*","Content-Type":"application/json"},"baseURL":"https://lapis-multifieldsearch.loculus.org/cchf","method":"post","url":"/sample/details","data":"{\"versionStatus\":\"LATEST_VERSION\",\"isRevocation\":\"false\",\"nucleotideMutations\":[],\"aminoAcidMutations\":[],\"nucleotideInsertions\":[],\"aminoAcidInsertions\":[],\"advancedQuery\":\"(accessionVersion.regex='(?i)PV920624' or submissionId.regex='(?i)PV920624' or insdcAccessionFull.regex='(?i)PV920624' or bioprojectAccession.regex='(?i)PV920624' or gcaAccession.regex='(?i)PV920624' or biosampleAccession.regex='(?i)PV920624' or insdcRawReadsAccession.regex='(?i)PV920624')\",\"fields\":[\"authorAffiliations\",\"authors\",\"geoLocAdmin1\",\"geoLocCountry\",\"hostNameScientific\",\"length_L\",\"length_M\",\"length_S\",\"lineage\",\"ncbiReleaseDate\",\"sampleCollectionDate\",\"accessionVersion\"],\"limit\":100,\"offset\":0,\"orderBy\":[{\"field\":\"sampleCollectionDate\",\"type\":\"descending\"}]}","params":{},"allowAbsoluteUrls":true},"code":"ERR_BAD_REQUEST","status":400}
Update: sorry when I first copied the error I somehow copied the whole page with all the sequence metadata
affc28b to
472d72a
Compare
|
Thanks for identifying the bug, @anna-parker! The |
kubernetes/loculus/values.yaml
Outdated
| defaultOrder: descending | ||
| multiFieldSearches: | ||
| - name: identifier | ||
| displayName: Identifier |
There was a problem hiding this comment.
maybe Sample Identifier would be clearer?
anna-parker
left a comment
There was a problem hiding this comment.
this is looking really great but it would be good to fix the ' escape issue before merging. I also think it would be nice to add a little info button next to the field (like for mutations) and explain which fields are included in this search
improve test fix perSegment Add integration test for CCHF
0e78b38 to
450b214
Compare
| const value = this.fieldValues[mfs.name]; | ||
| if (value && typeof value === 'string' && value.trim()) { | ||
| const regex = makeCaseInsensitiveLiteralSubstringRegex(value.trim()); | ||
| const escapedRegex = regex.replace(/'/g, "\\'"); |
Check failure
Code scanning / CodeQL
Incomplete string escaping or encoding High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 14 days ago
In general, the problem should be fixed by ensuring that when we escape characters for embedding in a quoted context, we escape both the quote character and any backslashes, so that we do not end up with sequences where an attacker-controlled backslash changes the meaning of the following character. Instead of only replacing ' with \', we should first escape \ as \\, then escape ' as \', or use a small helper that does this in one step with a regular expression that covers both characters.
In this specific file, the only problematic code is on line 154:
const escapedRegex = regex.replace(/'/g, "\\'");We should change it so that both backslashes and single quotes are escaped. To avoid partial escaping (e.g. double-escaping already escaped quotes), we can consistently transform the raw regex string in one pass. A simple, clear solution that does not change existing behavior except to handle backslashes correctly is:
const escapedRegex = regex.replace(/['\\]/g, (c) => `\\${c}`);This replaces each occurrence of ' or \ with a backslash-prefixed version (' → \', \ → \\). The rest of the surrounding logic (forming fieldQueries and advancedQueryParts) stays the same. No new imports or external dependencies are required; this uses built-in JavaScript string/regex features only. The change is localized to the line where escapedRegex is computed in website/src/components/SearchPage/DownloadDialog/SequenceFilters.tsx.
| @@ -151,7 +151,7 @@ | ||
| const value = this.fieldValues[mfs.name]; | ||
| if (value && typeof value === 'string' && value.trim()) { | ||
| const regex = makeCaseInsensitiveLiteralSubstringRegex(value.trim()); | ||
| const escapedRegex = regex.replace(/'/g, "\\'"); | ||
| const escapedRegex = regex.replace(/['\\]/g, (c) => `\\${c}`); | ||
| const fieldQueries = mfs.fields.map((f) => `${f}.regex='${escapedRegex}'`); | ||
| advancedQueryParts.push(`(${fieldQueries.join(' or ')})`); | ||
| } |
resolves nothing
This PR adds a feature that allows the configuration of search fields that search multiple fields by constructing a LAPIS advanced query. In the preview, there are now two new search fields
identifier: searches accessionVersion, submissionId, insdcAccessionFull, bioprojectAccession, gcaAccession, biosampleAccession, insdcRawReadsAccessioncontributor: searches authors, authorAffiliations, sequencedByOrganization, sequencedByContactName, submitter, groupNameThis can be used to resolve [gap to prevent automated linking] pathoplexus/pathoplexus#766.
Screenshot
PR Checklist
[ ] All necessary documentation has been adapted.🚀 Preview: Add
previewlabel to enable