From dce8dab74174ff66410a2bf8881c7cd21fbb2faa Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 7 Apr 2026 22:03:37 +0000 Subject: [PATCH 1/2] Initial plan From 5fa07021322f0f0a4fc8c0cc2c77eb44f4c7f53a Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 7 Apr 2026 22:13:36 +0000 Subject: [PATCH 2/2] Add wiki documentation for frequency_table, frequency_table_with_null, entropy, and average_entropy Agent-Logs-Url: https://github.com/ObjectVision/GeoDMS/sessions/c199675d-17f4-40b6-b8b2-de0ac1a8a9b9 Co-authored-by: MaartenHilferink <2284361+MaartenHilferink@users.noreply.github.com> --- wiki/average_entropy.md | 86 +++++++++++++++++++++++++++++++ wiki/entropy.md | 83 +++++++++++++++++++++++++++++ wiki/frequency_table.md | 74 ++++++++++++++++++++++++++ wiki/frequency_table_with_null.md | 74 ++++++++++++++++++++++++++ 4 files changed, 317 insertions(+) create mode 100644 wiki/average_entropy.md create mode 100644 wiki/entropy.md create mode 100644 wiki/frequency_table.md create mode 100644 wiki/frequency_table_with_null.md diff --git a/wiki/average_entropy.md b/wiki/average_entropy.md new file mode 100644 index 00000000..59e911ba --- /dev/null +++ b/wiki/average_entropy.md @@ -0,0 +1,86 @@ +*[[Aggregation functions]] average_entropy* + +## syntax + +- average_entropy(*a*) +- average_entropy(*a*, *relation*) + +## definition + +- average_entropy(*a*) results in a [[parameter]] with the **Shannon entropy** (in bits) of the non-[[null]] values of [[attribute]] *a*. +- average_entropy(*a*, *relation*) results in an attribute with the Shannon entropy (in bits) of the non-null values of attribute *a*, grouped by *[[relation]]*. The [[domain unit]] of the resulting attribute is the [[values unit]] of the *relation*. + +## description + +The Shannon entropy of a set of N observations is defined as: + +``` +average_entropy(a) = H(a) = -∑ pᵢ · log₂(pᵢ) +``` + +where pᵢ = nᵢ / N is the relative frequency of each distinct non-null value and N = ∑ nᵢ is the total number of non-null observations. + +This is also known as the *average* Shannon entropy, because it equals the [[entropy]] divided by N (the total count): + +``` +average_entropy(a) = entropy(a) / N +``` + +For a uniform distribution over k distinct values, `average_entropy(a)` equals `log₂(k)`. + +The result is 0 when all observations have the same value (no uncertainty), or when N = 0 (empty partition). + +## applies to + +- attribute *a* with any scalar [[value type]] +- *relation* with value type of the group CanBeDomainUnit + +## conditions + +1. The domain of [[argument]] *a* and *relation* must match. + +## since version + +14.4.0 + +## example + +``` +parameter avgEntropyLifeStyleCode := average_entropy(City/LifeStyleCode); +// result ≈ 1.459 + +attribute avgEntropyLifeStyleCodePerRegion (Region) := average_entropy(City/LifeStyleCode, City/Region_rel); +``` + +| City/LifeStyleCode | City/Region_rel | +|-------------------:|----------------:| +| 2 | 0 | +| 0 | 1 | +| 1 | 2 | +| 0 | 1 | +| 1 | 3 | +| 1 | null | +| null | 3 | + +*domain City, nr of rows = 7* + +For the total: non-null values are [2, 0, 1, 0, 1, 1], so N = 6, counts: 0→2, 1→3, 2→1. +`average_entropy = -(2/6·log₂(2/6) + 3/6·log₂(3/6) + 1/6·log₂(1/6)) ≈ 1.459` + +| **avgEntropyLifeStyleCodePerRegion** | +|-------------------------------------:| +| **0** | +| **0** | +| **0** | +| **0** | +| **0** | + +*domain Region, nr of rows = 5* + +Each region has only one unique non-null value (or no non-null data), so average_entropy = 0 for all regions. Region 3 has City 6 with null LifeStyleCode (excluded) and City 4 with LifeStyleCode=1 (only one unique value → average_entropy 0). Region 4 has no cities at all, so N=0 and average_entropy = 0. + +## see also + +- [[entropy]] - the total Shannon entropy (N · H), i.e. the sum of individual information contributions +- [[modus]] - the most frequently occurring value +- [[unique_count]] - number of distinct non-null values diff --git a/wiki/entropy.md b/wiki/entropy.md new file mode 100644 index 00000000..f1369ae5 --- /dev/null +++ b/wiki/entropy.md @@ -0,0 +1,83 @@ +*[[Aggregation functions]] entropy* + +## syntax + +- entropy(*a*) +- entropy(*a*, *relation*) + +## definition + +- entropy(*a*) results in a [[parameter]] with the **total Shannon entropy** (in bits) of the non-[[null]] values of [[attribute]] *a*. +- entropy(*a*, *relation*) results in an attribute with the total Shannon entropy (in bits) of the non-null values of attribute *a*, grouped by *[[relation]]*. The [[domain unit]] of the resulting attribute is the [[values unit]] of the *relation*. + +## description + +The total Shannon entropy of a set of N observations is defined as: + +``` +entropy(a) = N · H(a) + = -∑ nᵢ · log₂(nᵢ / N) +``` + +where nᵢ is the count of each distinct non-null value and N = ∑ nᵢ is the total number of non-null observations. + +This equals N times the average (per-element) Shannon entropy H(a). See [[average_entropy]] for the average Shannon entropy H(a). + +For a uniform distribution over k distinct values, `entropy(a)` equals `N · log₂(k)`. + +The result is 0 when all observations have the same value (no uncertainty), or when N = 0 (empty partition). + +## applies to + +- attribute *a* with any scalar [[value type]] +- *relation* with value type of the group CanBeDomainUnit + +## conditions + +1. The domain of [[argument]] *a* and *relation* must match. + +## since version + +14.4.0 + +## example + +``` +parameter entropyLifeStyleCode := entropy(City/LifeStyleCode); +// result ≈ 8.757 + +attribute entropyLifeStyleCodePerRegion (Region) := entropy(City/LifeStyleCode, City/Region_rel); +``` + +| City/LifeStyleCode | City/Region_rel | +|-------------------:|----------------:| +| 2 | 0 | +| 0 | 1 | +| 1 | 2 | +| 0 | 1 | +| 1 | 3 | +| 1 | null | +| null | 3 | + +*domain City, nr of rows = 7* + +For the total: non-null values are [2, 0, 1, 0, 1, 1], so N = 6, counts: 0→2, 1→3, 2→1. +`entropy = -(2·log₂(2/6) + 3·log₂(3/6) + 1·log₂(1/6)) ≈ 8.757` + +| **entropyLifeStyleCodePerRegion** | +|----------------------------------:| +| **0** | +| **0** | +| **0** | +| **0** | +| **0** | + +*domain Region, nr of rows = 5* + +Each region has only one unique non-null value (or no non-null data), so entropy = 0 for all regions. Region 3 has City 6 with null LifeStyleCode (excluded) and City 4 with LifeStyleCode=1 (only one unique value → entropy 0). Region 4 has no cities at all, so N=0 and entropy = 0. + +## see also + +- [[average_entropy]] - the Shannon entropy per element (H = entropy / N), i.e. the standard Shannon entropy formula +- [[modus]] - the most frequently occurring value +- [[unique_count]] - number of distinct non-null values diff --git a/wiki/frequency_table.md b/wiki/frequency_table.md new file mode 100644 index 00000000..6af6a698 --- /dev/null +++ b/wiki/frequency_table.md @@ -0,0 +1,74 @@ +*[[Aggregation functions]] frequency_table* + +## syntax + +- frequency_table(*a*) +- frequency_table(*a*, *relation*) + +## definition + +- frequency_table(*a*) results in a [[parameter]] with a string listing all non-[[null]] values of [[attribute]] *a* together with how often each value occurs, separated by "; ". +- frequency_table(*a*, *relation*) results in an attribute with such strings, one per partition defined by *[[relation]]*. The [[domain unit]] of the resulting attribute is the [[values unit]] of the *relation*. Each partition string contains the value-count pairs for the non-null values of *a* belonging to that partition. + +## description + +The result per partition is a string of the form `value1: count1; value2: count2; ...`, where: + +- values are listed in ascending order (the order defined by the [[values unit]] of attribute *a*), +- only values with a non-zero count are included, +- null values in *a* are **excluded** from the counts. + +To include null values in the frequency table, use [[frequency_table_with_null]] instead. + +## applies to + +- attribute *a* with any scalar [[value type]] +- *relation* with value type of the group CanBeDomainUnit + +## conditions + +1. The domain of [[argument]] *a* and *relation* must match. + +## since version + +14.4.0 + +## example + +``` +parameter freqLifeStyleCode := frequency_table(City/LifeStyleCode); +// result = "0: 2; 1: 3; 2: 1" + +attribute freqLifeStyleCodePerRegion (Region) := frequency_table(City/LifeStyleCode, City/Region_rel); +``` + +| City/LifeStyleCode | City/Region_rel | +|-------------------:|----------------:| +| 2 | 0 | +| 0 | 1 | +| 1 | 2 | +| 0 | 1 | +| 1 | 3 | +| 1 | null | +| null | 3 | + +*domain City, nr of rows = 7* + +| **freqLifeStyleCodePerRegion** | +|-------------------------------| +| **"2: 1"** | +| **"0: 2"** | +| **"1: 1"** | +| **"1: 1"** | +| **""** | + +*domain Region, nr of rows = 5* + +City 6 (LifeStyleCode = null) is excluded. City 5 (Region_rel = null) is excluded from all groups. + +## see also + +- [[frequency_table_with_null]] - variant that includes null values of *a* in the frequency table +- [[as_unique_list]] - like frequency_table but only lists the distinct values, without the counts +- [[modus]] - returns only the most frequently occurring value +- [[unique_count]] - returns the number of distinct non-null values diff --git a/wiki/frequency_table_with_null.md b/wiki/frequency_table_with_null.md new file mode 100644 index 00000000..c0a37a23 --- /dev/null +++ b/wiki/frequency_table_with_null.md @@ -0,0 +1,74 @@ +*[[Aggregation functions]] frequency_table_with_null* + +## syntax + +- frequency_table_with_null(*a*) +- frequency_table_with_null(*a*, *relation*) + +## definition + +- frequency_table_with_null(*a*) results in a [[parameter]] with a string listing **all** values of [[attribute]] *a* — including [[null]] values — together with how often each value occurs, separated by "; ". +- frequency_table_with_null(*a*, *relation*) results in an attribute with such strings, one per partition defined by *[[relation]]*. The [[domain unit]] of the resulting attribute is the [[values unit]] of the *relation*. Each partition string contains the value-count pairs for all values of *a* (including null) belonging to that partition. + +## description + +The result per partition is a string of the form `value1: count1; value2: count2; ...`, where: + +- values are listed in ascending order (the order defined by the [[values unit]] of attribute *a*), +- only values with a non-zero count are included, +- null values in *a* are **included** in the frequency table and are shown as `: count`. + +This function is identical to [[frequency_table]] except that null values in *a* are counted and included in the result string. Elements mapped to a null partition (null *relation* value) are still excluded from all groups. + +## applies to + +- attribute *a* with any scalar [[value type]] +- *relation* with value type of the group CanBeDomainUnit + +## conditions + +1. The domain of [[argument]] *a* and *relation* must match. + +## since version + +14.4.0 + +## example + +``` +parameter freqLifeStyleCodeWithNull := frequency_table_with_null(City/LifeStyleCode); +// result = "0: 2; 1: 3; 2: 1; : 1" + +attribute freqLifeStyleCodeWithNullPerRegion (Region) := frequency_table_with_null(City/LifeStyleCode, City/Region_rel); +``` + +| City/LifeStyleCode | City/Region_rel | +|-------------------:|----------------:| +| 2 | 0 | +| 0 | 1 | +| 1 | 2 | +| 0 | 1 | +| 1 | 3 | +| 1 | null | +| null | 3 | + +*domain City, nr of rows = 7* + +| **freqLifeStyleCodeWithNullPerRegion** | +|---------------------------------------| +| **"2: 1"** | +| **"0: 2"** | +| **"1: 1"** | +| **"1: 1; <null>: 1"** | +| **""** | + +*domain Region, nr of rows = 5* + +City 6 (LifeStyleCode = null, Region_rel = 3) is included in Region 3's count as `: 1`. City 5 (Region_rel = null) is excluded from all groups. + +## see also + +- [[frequency_table]] - variant that excludes null values of *a* from the frequency table +- [[as_unique_list]] - like frequency_table but only lists the distinct values, without the counts +- [[modus]] - returns only the most frequently occurring value +- [[unique_count]] - returns the number of distinct non-null values