|
| 1 | +# Galach query language |
| 2 | + |
| 3 | +Galach is based on a syntax that seems to be the unofficial standard for search query as user input. |
| 4 | +You're probably already somewhat familiar with it, as the same basic syntax is used by virtually all |
| 5 | +popular web search engines out there. It is also very similar to [Lucene Query Parser syntax](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html), |
| 6 | +used by both Solr and Elasticsearch. |
| 7 | + |
| 8 | +## Terms |
| 9 | + |
| 10 | +1. `Word` term is a string not containing whitespace, unless that whitespace is escaped. |
| 11 | + |
| 12 | + ``` |
| 13 | + word |
| 14 | + ``` |
| 15 | + ``` |
| 16 | + another\ word |
| 17 | + ``` |
| 18 | +
|
| 19 | +2. `Phrase` term is formed by enclosing words within quotation marks. |
| 20 | +
|
| 21 | + Both single and double quotes are supported: `"`, `'` |
| 22 | +
|
| 23 | + ``` |
| 24 | + 'reality exists' |
| 25 | + ``` |
| 26 | + ``` |
| 27 | + "what's not real doesn't exist" |
| 28 | + ``` |
| 29 | +
|
| 30 | +3. `User` term is defined by a leading `@` character, followed by at least one alphanumeric or |
| 31 | + underscore character, followed by arbitrary sequence of alphanumeric characters, hyphens, |
| 32 | + underscores and dots. |
| 33 | +
|
| 34 | + Regular expression: |
| 35 | +
|
| 36 | + ``` |
| 37 | + @[a-zA-Z0-9_][a-zA-Z0-9_\-.]* |
| 38 | + ``` |
| 39 | +
|
| 40 | + Examples: |
| 41 | +
|
| 42 | + ``` |
| 43 | + @joe.watt |
| 44 | + ``` |
| 45 | + ``` |
| 46 | + @_alice83 |
| 47 | + ``` |
| 48 | + ``` |
| 49 | + @The-Ronald |
| 50 | + ``` |
| 51 | +
|
| 52 | +4. `Tag` term is defined by a leading `#` character, followed by at least one alphanumeric or |
| 53 | + underscore character, followed by arbitrary sequence of alphanumeric characters, hyphens, |
| 54 | + underscores and dots. |
| 55 | +
|
| 56 | + Regular expression: |
| 57 | +
|
| 58 | + ``` |
| 59 | + \#[a-zA-Z0-9_][a-zA-Z0-9_\-.]* |
| 60 | + ``` |
| 61 | +
|
| 62 | + Examples: |
| 63 | +
|
| 64 | + ``` |
| 65 | + #php |
| 66 | + ``` |
| 67 | + ``` |
| 68 | + #PHP-7.1 |
| 69 | + ``` |
| 70 | + ``` |
| 71 | + #query_parser |
| 72 | + ``` |
| 73 | +
|
| 74 | +## Operators |
| 75 | +
|
| 76 | +Terms can be combined or modified using binary and unary operators: |
| 77 | +
|
| 78 | +1. `Logical and` is a binary operator that combines left and right operands so that both must |
| 79 | + match. |
| 80 | +
|
| 81 | + It comes in two forms: `AND`, `&&` |
| 82 | +
|
| 83 | + In both cases it must be separated from it's operands by whitespace. |
| 84 | +
|
| 85 | + ``` |
| 86 | + coffee AND milk |
| 87 | + ``` |
| 88 | + ``` |
| 89 | + tea && lemon |
| 90 | + ``` |
| 91 | +
|
| 92 | +2. `Logical or` is a binary operator that combines left and right operands so that at least one of |
| 93 | + them has to match. |
| 94 | +
|
| 95 | + It comes in two forms: `OR`, `||` |
| 96 | +
|
| 97 | + In both cases it must be separated from it's operands by whitespace. |
| 98 | +
|
| 99 | + ``` |
| 100 | + potato OR tomato |
| 101 | + ``` |
| 102 | + ``` |
| 103 | + true || false |
| 104 | + ``` |
| 105 | +
|
| 106 | +3. `Logical not` is a unary operator that modifies it's operand so that it must not match. |
| 107 | +
|
| 108 | + It comes in two forms: `NOT`, `!` |
| 109 | +
|
| 110 | + When `NOT` form is used, it must be separated from it's operand by whitespace: |
| 111 | +
|
| 112 | + ``` |
| 113 | + NOT important |
| 114 | + ``` |
| 115 | +
|
| 116 | + When shorthand form `!` is used it must be adjacent to it's operand: |
| 117 | +
|
| 118 | + ``` |
| 119 | + !important |
| 120 | + ``` |
| 121 | +
|
| 122 | +4. `Mandatory` is a unary operator that modifies it's operand so that it must match. |
| 123 | + It's represented by plus sign `+` and must be placed adjacent to it's operand. |
| 124 | +
|
| 125 | + ``` |
| 126 | + +coffee |
| 127 | + ``` |
| 128 | +
|
| 129 | +5. `Prohibited` is a unary operator that modifies it's operand so that it must not match. |
| 130 | + It's represented by minus sign `-` and must be placed adjacent to it's operand. |
| 131 | +
|
| 132 | + ``` |
| 133 | + -cake |
| 134 | + ``` |
| 135 | +
|
| 136 | +### Operator precedence |
| 137 | +
|
| 138 | +Unary operators are applied first, followed by binary operators. When it comes to binary operators, `Logical and` precedes `Logical or`: |
| 139 | +
|
| 140 | +1. `Logical not`, `Mandatory`, `Prohibited` |
| 141 | +2. `Logical and` |
| 142 | +3. `Logical or` |
| 143 | +
|
| 144 | +## Grouping |
| 145 | +
|
| 146 | +Terms and expressions can be grouped using round brackets. A group is processed as a whole. |
| 147 | +Following two examples will be processed as the same, since grouping follows operator associativity: |
| 148 | +
|
| 149 | +``` |
| 150 | +one OR NOT two AND three |
| 151 | +``` |
| 152 | +``` |
| 153 | +one OR ((NOT two) AND three) |
| 154 | +``` |
| 155 | +
|
| 156 | +But you can also use grouping to change the meaning that would follow from operator associativity: |
| 157 | +
|
| 158 | +``` |
| 159 | +(one OR NOT two) AND three |
| 160 | +``` |
| 161 | +``` |
| 162 | +one OR NOT (two AND three) |
| 163 | +``` |
| 164 | +
|
| 165 | +## Domains |
| 166 | +
|
| 167 | +Domain is an abstract category on which the term or group applies. It's defined by prefixing the |
| 168 | +term or group with a domain string, followed by a colon `:`. Domain string must start with at least |
| 169 | +one alphanumeric or underscore character, and is followed by arbitrary sequence of alphanumeric |
| 170 | +characters, hyphens `-` and underscores `_`. |
| 171 | +
|
| 172 | +Note that domain cannot be used on `Tag` and `User` terms. These two in fact define implicit domains |
| 173 | +of their own. |
| 174 | +
|
| 175 | +Regular expression for domain string: |
| 176 | +
|
| 177 | +``` |
| 178 | +[a-zA-Z_][a-zA-Z0-9_\-]* |
| 179 | +``` |
| 180 | +
|
| 181 | +Examples: |
| 182 | +
|
| 183 | +``` |
| 184 | +type:aeroplane |
| 185 | +``` |
| 186 | +``` |
| 187 | +title:"Language processor" |
| 188 | +``` |
| 189 | +``` |
| 190 | +description:(wings AND propeller) |
| 191 | +``` |
| 192 | +
|
| 193 | +## Special characters |
| 194 | +
|
| 195 | +Characters that are part of the language syntax must be escaped in order not to be recognized as |
| 196 | +such by the engine. These are: |
| 197 | +
|
| 198 | +- `(` left paren |
| 199 | +- `)` right paren |
| 200 | +- `+` plus |
| 201 | +- `-` minus |
| 202 | +- `!` exclamation mark |
| 203 | +- `"` double quote |
| 204 | +- `'` single quote |
| 205 | +- `#` hash |
| 206 | +- `@` at sign |
| 207 | +- `:` colon |
| 208 | +- `\` backslash |
| 209 | +- `␣` blank space |
| 210 | +
|
| 211 | +Character used for escaping is backslash `\`: |
| 212 | +
|
| 213 | +``` |
| 214 | +joined\ word |
| 215 | +``` |
| 216 | +``` |
| 217 | +"escaped \"double quote\"" |
| 218 | +``` |
| 219 | +``` |
| 220 | +'escaped \'single quote\'' |
| 221 | +``` |
| 222 | +``` |
| 223 | +escaped \+operator domain\:word \@user \#tag \(and so on\) |
| 224 | +``` |
| 225 | +``` |
| 226 | +double backslash \\ is a backslash escaped |
| 227 | +``` |
| 228 | +
|
| 229 | +Aside from quotation marks themselves, escaping is not required inside phrases. Since quotes are |
| 230 | +used as delimiters, everything between them is taken as-is. Hence, these will be processed as the |
| 231 | +same: |
| 232 | +
|
| 233 | +``` |
| 234 | +"+one -two" |
| 235 | +``` |
| 236 | +``` |
| 237 | +"\+one \-two" |
| 238 | +``` |
| 239 | +
|
| 240 | +In some cases tokenizer will automatically assume that special character is to be interpreted as if |
| 241 | +it was escaped. Following pairs will be processed as the same: |
| 242 | +
|
| 243 | +1. Colon at the end of a `Word` is considered part of the `Word` |
| 244 | +
|
| 245 | + ``` |
| 246 | + word: |
| 247 | + ``` |
| 248 | + ``` |
| 249 | + word\: |
| 250 | + ``` |
| 251 | +
|
| 252 | +2. Colon placed after a domain colon is considered part of the `Word` |
| 253 | +
|
| 254 | + ``` |
| 255 | + domain:domain:domain |
| 256 | + ``` |
| 257 | + ``` |
| 258 | + domain:domain\:domain |
| 259 | + ``` |
| 260 | +
|
| 261 | +3. Domain can't be used on a `Tag` and `User` terms |
| 262 | +
|
| 263 | + ``` |
| 264 | + domain:#tag domain:@user |
| 265 | + ``` |
| 266 | + ``` |
| 267 | + domain:\#tag domain:\@user |
| 268 | + ``` |
| 269 | +
|
| 270 | +4. Characters used for `Mandatory`, `Prohibited` and shorthand `Logical not` operators can be |
| 271 | + considered part of the `Word`: |
| 272 | +
|
| 273 | + - When placed after domain colon |
| 274 | +
|
| 275 | + ``` |
| 276 | + domain:+word domain:-word domain:!word |
| 277 | + ``` |
| 278 | + ``` |
| 279 | + domain:\+word domain:\-word domain:\!word |
| 280 | + ``` |
| 281 | +
|
| 282 | + - When placed in the middle of the word |
| 283 | +
|
| 284 | + ``` |
| 285 | + one+two one-two one!two |
| 286 | + ``` |
| 287 | + ``` |
| 288 | + one\+two one\-two one\!two |
| 289 | + ``` |
| 290 | +
|
| 291 | + - When placed at the end of the `Word` |
| 292 | +
|
| 293 | + ``` |
| 294 | + one+ two- three! |
| 295 | + ``` |
| 296 | + ``` |
| 297 | + one\+ two\- three\! |
| 298 | + ``` |
0 commit comments