-
Notifications
You must be signed in to change notification settings - Fork 1
Syntax
This page introduces the syntax for both the current state of our proposed GSM query language for patten matching and rewriting, and the associated script for testing properties at the data, and schema level, thus providing for metamodel capabilities.
Each graph rule is semi-colon separated, where the last match shall not terminate with a semicolon. The grammar requires at least one graph rule (centralmatch)
Each graph grammar comes with a rule name (OTHERS) and is introduced by an equivalence sign (EQ). It requires providing a node match with a name, potential ingoing or outgoing edges (many_edges) and some joining edges not necessarily referring to the ego-net of the matching node (edge_joining). Data matching conditions on the nodes or the edges being considered are introduced by a where non-terminal. The rewriting is optional: if not specified, both the node and the descendant nodes are unchanged. Otherwise, we can selectively update the nodes in the graph being matched (rewrite_to). As the matching node might be replaced by a newly created node specified in the rewriting phase, this last one must also be specified explicitly at the end of the rewriting.
This statement refers to a generic node match. When appearing at the beginning of a pattern match query, it refers to the entry-point of the graph query. Node declarations are bounded by rounded parentheses (L/RPAR), and might be associated to one or more variables ||-separated across the patterns (multiple_labels), so to determine the order of the graph grammar's rules application depending on the relationship between such nodes.
The following parts of this declaration are only expressed by the query language at this stage but are left to future work for implementation:
-
STARsemantics: expressing the optionality/multi-node match without necessarily nesting the morphism. - specifying an optional match referring to the node's label (
COL (:) OTHERS). At these stage, these are only supported via subsequent data conditions (seeWHERE), and not by pruning the retrieved edges within the morphism.
This matches many optional single edges arriving or departing from the query entry point. These edges can refer to an outgoing edge (#1), an ongoing edge (#2) or a hook (#3). When the node is morphism-aggregated, this refers to an edge connecting all the nodes being grouped by, but not necessarily forming a clique.
We now discuss the edgelabel not-terminal component: edges are introduced by squared brackets (Q/PPAR): the question mark remarks an edge being optionally matched (QM), while the ∀ symbol (FORALL) indicates that the target node of the current directed edge will be nested within a nested morphism. Edges can be optionally referred to by a variable name (OTHERS) that should always be followed by a colon (:). We can there express the possible options allowed for the edge's label via multiple_labels.
Last, we can specify free edges to connect the already-described nodes through multiple optional edge_joining edges within the same pattern.
The data conditions are expressed as a further refinement of the current graph pattern and are optionally introduced by a where token. We can then exploit straightforward equality or inequality conditions (=,≠,<,≤) between expressions identified in rewrite_expr, express a conjunction (∧) or disjunction (∨) against the previous boolean predicates, to use an advanced data predicate being expressed in Script v3.0 syntax in double-quotes. We can also further determine whether to keep an entire morphism from the current pattern as valid if one of the nodes or edges associated to a variable OTHER₁ do (not) appear as part of a morphism of the pattern OTHER₂ associated to a variable OTHER₃.
The data condition can be either a simple non double-quoted string (OTHER) or some specific data retrieved through the morphism (test_expr_side), which can be the following:
- The
OTHERS-th value (𝜉) associated to an object or variablerewrite_expr - The
OTHERS-th label (ℓ) associated to an object or variablerewrite_expr - A key or property
rewrite_expr₁belonging to an object or variablerewrite_expr₂ - A node-id accessible through an outgoing edge (or GSM containment) of
rewrite_expr₂with label (or containment name)rewrite_expr₁ - The edge label for an edge associated with a variable
rewrite_expr - The source (
src) or an edge associated with a variablerewrite_expr - The target (
dst) of an edge associated with a variablerewrite_expr - An if-then-else condition, where the alternative condition is optional
- Yet again a boolean interpretation of a Script v3.0 expression within double quotes.
The graph rewriting operations are optional and are introduced by a ↪ sign (REWRITE_TO). Please observe that matching a graph without necessarily restructuring the graph will, in this query language, return the same graph as initially loaded and will not return a set of subgraphs of the original graph; to obtain the latter, you also have to specify which edges or nodes have to be deleted (del). new will temporally instantiate a variable OTHERS holding a newly created object, while set will update the values in rewrite_expr₁ with the ones determined in rewrite_expr₂. If the current entry-point query has been removed and we want to replace such a node with a newly created one, we can then express our intent to return a specific node associated with a given variable with an optional final node statement.
The full Syntax Diagram generated from this ANTLR4 grammar provides the full specification of this language.
The full syntax of multi-line Script allows the definition of distinct semicolon-separated statements, while the current implementation of DatagramDB relies on one single expression. Among the basic types, we allow decimals with exponent notation, as well as (un)signed integers and double-quoted strings. Booleans are either tt (true) or ff (false). ┬ represents Java's Object (the most abstract type available), while ┴ refers to the type of NULL. Named functions are declared as "lambdas" assigned to a variable, where lambdas can be defined as fun <varname> → <block>. Line comments are introduced with # and block comments are à la C: /* ... */.
As this language does also support dependant types, type terms are also allowed in the syntax. Still, this feature is not a key component for this lanugage, but it is described in better detailed in the following pubblication:
G. Bergami, W. Zegadło. 2023. "Towards a Generalised Semistructured Data Model and Query Language" SIGWEB Newsl. 2023, Summer, Article 4 (Summer 2023), 22 pages.
Still, the current syntax has been improved to easily be parsed with an LL(1) grammar, thus allowing a more efficient code analysis at run time. Therefore, the specific syntax from that paper was used in this, another version of our project.
Among others, we also distinguish the following simple terms:
- The cardinality of an object can be expressed using the vertical bars,
| expr |. - If we want to refer to an externally-declared variable having a free syntax being different from the one allowed by the current language (i.e., always in lowercase), we can refer to them by wrapping a string around the following type of parentheses:
⦃ "expr" ⦄. - We can interpret a specific string as a Script v3.0 code by wrapping the string or variable holding such string with the following parentheses:
⦋ expr ⦌ - Tuples, i.e. key-value maps, are enclosed by
⎨...⎬parentheses, each pair is comma separated, and each key should be always a double quoted string mapping to (↦) an arbitrary expression. - Arrays are defined by semicolon-separated lists of expressions enclosed by curly brackets.
We can observe the following unary operators:
- Boolean Operators (Negation
¬) - Trigonometric functions (
sin,cos,tan) - GSM operations over integers representing the object reference (label array
ℓ, value array𝜉, list of all the outgoing nodes𝜑) - Functional operations over lists (injection
inj, flattening of lists of lists:flat) - Object dereferencing (from integer references,
J). - Assertion for (meta)schema compliance (
assert)
All binary operators are expressed in pre-fix notation, and are the following:
- Mathematical operations (
+,-,÷, modulo%, multiplication·, natural logarithmlog, exponentiationpow) - Collection (append
@, map°, filtering or selectionσormod, cross product⨯) - String concatenation
^ - Operations over booleans (and
∧, or∨) - Comparison operators (
=,≠,≤,≥,>,<) - Returning the value on the right only if the condition on the left happens,
⇒ - Obtaining the i-th element of an array or a specific field of an object:
[ expr₁ expr₂ ] - Selecting only the components for an object in the first argument as expressed in the second:
⟦ expr₁ expr₂ ⟧ - Object containment
∈, i.e. whether the first object is contained in the second. - Object removal
γ, i.e., removing the first object from the second. - Containment for an object
termover a labelexpr:𝜙
We also use the pre-fix notation for variable declaration (≝) and application (.), where the last also encompasses accessing objects, tuples, or arrays at specific locations.
Ternary operators are expressed as binary operators, where the first two operands are wrapped by a cp pre-fix bogus operator. This includes:
- If-then-else statements
e, where the condition and the else are the first two arguments incp. - Subtring, subarray, or sub-typle
ς, where the first argument is the collection, the second (and third) element refer to the begin and end index (exclusive) to consider. - Assignment
⥆in the object in the first position given the specific location on the second argument, of the value specified in the third. - Fold operations (right fold
r, left foldl)