Skip to content

Commit a5aa2cd

Browse files
committed
Add readme file for Galach
1 parent 0b06c96 commit a5aa2cd

File tree

1 file changed

+298
-0
lines changed

1 file changed

+298
-0
lines changed

lib/Languages/Galach/README.md

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
# Galach query language
2+
3+
Galach is based on a syntax that seems to be the unofficial standard for search query as user input.
4+
You're probably already somewhat familiar with it, as the same basic syntax is used by virtually all
5+
popular web search engines out there. It is also very similar to [Lucene Query Parser syntax](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html),
6+
used by both Solr and Elasticsearch.
7+
8+
## Terms
9+
10+
1. `Word` term is a string not containing whitespace, unless that whitespace is escaped.
11+
12+
```
13+
word
14+
```
15+
```
16+
another\ word
17+
```
18+
19+
2. `Phrase` term is formed by enclosing words within quotation marks.
20+
21+
Both single and double quotes are supported: `"`, `'`
22+
23+
```
24+
'reality exists'
25+
```
26+
```
27+
"what's not real doesn't exist"
28+
```
29+
30+
3. `User` term is defined by a leading `@` character, followed by at least one alphanumeric or
31+
underscore character, followed by arbitrary sequence of alphanumeric characters, hyphens,
32+
underscores and dots.
33+
34+
Regular expression:
35+
36+
```
37+
@[a-zA-Z0-9_][a-zA-Z0-9_\-.]*
38+
```
39+
40+
Examples:
41+
42+
```
43+
@joe.watt
44+
```
45+
```
46+
@_alice83
47+
```
48+
```
49+
@The-Ronald
50+
```
51+
52+
4. `Tag` term is defined by a leading `#` character, followed by at least one alphanumeric or
53+
underscore character, followed by arbitrary sequence of alphanumeric characters, hyphens,
54+
underscores and dots.
55+
56+
Regular expression:
57+
58+
```
59+
\#[a-zA-Z0-9_][a-zA-Z0-9_\-.]*
60+
```
61+
62+
Examples:
63+
64+
```
65+
#php
66+
```
67+
```
68+
#PHP-7.1
69+
```
70+
```
71+
#query_parser
72+
```
73+
74+
## Operators
75+
76+
Terms can be combined or modified using binary and unary operators:
77+
78+
1. `Logical and` is a binary operator that combines left and right operands so that both must
79+
match.
80+
81+
It comes in two forms: `AND`, `&&`
82+
83+
In both cases it must be separated from it's operands by whitespace.
84+
85+
```
86+
coffee AND milk
87+
```
88+
```
89+
tea && lemon
90+
```
91+
92+
2. `Logical or` is a binary operator that combines left and right operands so that at least one of
93+
them has to match.
94+
95+
It comes in two forms: `OR`, `||`
96+
97+
In both cases it must be separated from it's operands by whitespace.
98+
99+
```
100+
potato OR tomato
101+
```
102+
```
103+
true || false
104+
```
105+
106+
3. `Logical not` is a unary operator that modifies it's operand so that it must not match.
107+
108+
It comes in two forms: `NOT`, `!`
109+
110+
When `NOT` form is used, it must be separated from it's operand by whitespace:
111+
112+
```
113+
NOT important
114+
```
115+
116+
When shorthand form `!` is used it must be adjacent to it's operand:
117+
118+
```
119+
!important
120+
```
121+
122+
4. `Mandatory` is a unary operator that modifies it's operand so that it must match.
123+
It's represented by plus sign `+` and must be placed adjacent to it's operand.
124+
125+
```
126+
+coffee
127+
```
128+
129+
5. `Prohibited` is a unary operator that modifies it's operand so that it must not match.
130+
It's represented by minus sign `-` and must be placed adjacent to it's operand.
131+
132+
```
133+
-cake
134+
```
135+
136+
### Operator precedence
137+
138+
Unary operators are applied first, followed by binary operators. When it comes to binary operators, `Logical and` precedes `Logical or`:
139+
140+
1. `Logical not`, `Mandatory`, `Prohibited`
141+
2. `Logical and`
142+
3. `Logical or`
143+
144+
## Grouping
145+
146+
Terms and expressions can be grouped using round brackets. A group is processed as a whole.
147+
Following two examples will be processed as the same, since grouping follows operator associativity:
148+
149+
```
150+
one OR NOT two AND three
151+
```
152+
```
153+
one OR ((NOT two) AND three)
154+
```
155+
156+
But you can also use grouping to change the meaning that would follow from operator associativity:
157+
158+
```
159+
(one OR NOT two) AND three
160+
```
161+
```
162+
one OR NOT (two AND three)
163+
```
164+
165+
## Domains
166+
167+
Domain is an abstract category on which the term or group applies. It's defined by prefixing the
168+
term or group with a domain string, followed by a colon `:`. Domain string must start with at least
169+
one alphanumeric or underscore character, and is followed by arbitrary sequence of alphanumeric
170+
characters, hyphens `-` and underscores `_`.
171+
172+
Note that domain cannot be used on `Tag` and `User` terms. These two in fact define implicit domains
173+
of their own.
174+
175+
Regular expression for domain string:
176+
177+
```
178+
[a-zA-Z_][a-zA-Z0-9_\-]*
179+
```
180+
181+
Examples:
182+
183+
```
184+
type:aeroplane
185+
```
186+
```
187+
title:"Language processor"
188+
```
189+
```
190+
description:(wings AND propeller)
191+
```
192+
193+
## Special characters
194+
195+
Characters that are part of the language syntax must be escaped in order not to be recognized as
196+
such by the engine. These are:
197+
198+
- `(` left paren
199+
- `)` right paren
200+
- `+` plus
201+
- `-` minus
202+
- `!` exclamation mark
203+
- `"` double quote
204+
- `'` single quote
205+
- `#` hash
206+
- `@` at sign
207+
- `:` colon
208+
- `\` backslash
209+
- `␣` blank space
210+
211+
Character used for escaping is backslash `\`:
212+
213+
```
214+
joined\ word
215+
```
216+
```
217+
"escaped \"double quote\""
218+
```
219+
```
220+
'escaped \'single quote\''
221+
```
222+
```
223+
escaped \+operator domain\:word \@user \#tag \(and so on\)
224+
```
225+
```
226+
double backslash \\ is a backslash escaped
227+
```
228+
229+
Aside from quotation marks themselves, escaping is not required inside phrases. Since quotes are
230+
used as delimiters, everything between them is taken as-is. Hence, these will be processed as the
231+
same:
232+
233+
```
234+
"+one -two"
235+
```
236+
```
237+
"\+one \-two"
238+
```
239+
240+
In some cases tokenizer will automatically assume that special character is to be interpreted as if
241+
it was escaped. Following pairs will be processed as the same:
242+
243+
1. Colon at the end of a `Word` is considered part of the `Word`
244+
245+
```
246+
word:
247+
```
248+
```
249+
word\:
250+
```
251+
252+
2. Colon placed after a domain colon is considered part of the `Word`
253+
254+
```
255+
domain:domain:domain
256+
```
257+
```
258+
domain:domain\:domain
259+
```
260+
261+
3. Domain can't be used on a `Tag` and `User` terms
262+
263+
```
264+
domain:#tag domain:@user
265+
```
266+
```
267+
domain:\#tag domain:\@user
268+
```
269+
270+
4. Characters used for `Mandatory`, `Prohibited` and shorthand `Logical not` operators can be
271+
considered part of the `Word`:
272+
273+
- When placed after domain colon
274+
275+
```
276+
domain:+word domain:-word domain:!word
277+
```
278+
```
279+
domain:\+word domain:\-word domain:\!word
280+
```
281+
282+
- When placed in the middle of the word
283+
284+
```
285+
one+two one-two one!two
286+
```
287+
```
288+
one\+two one\-two one\!two
289+
```
290+
291+
- When placed at the end of the `Word`
292+
293+
```
294+
one+ two- three!
295+
```
296+
```
297+
one\+ two\- three\!
298+
```

0 commit comments

Comments
 (0)