This file serves for two purposes:
- Challenge type system designers
- Set up a reference for comparing programming medias on their
- expressiveness: is an operators provided in one media but not the other?
- enforcement of constraints: how many of the required constraints are enforced? How many of the ensured constraints are communicated to the type system?
Real-world programming medias contain lots of operations. Collecting all of them won't be practical or necessary for the purposes of this file. Instead, we strive to gather at least all operators that are necessary for real-world data analysis. (Please let us know if you think a necessary operator is missing.) Furthermore, some operators impose interesting constraints that might be challenging to type systems. We selectively include some of these operators and hopefully they will illustrate all constraints that a type systems need to handle. In short, an operator is included if it meets one of the following criteria:
- necessary for realistic table programming
- illustrating interesting constraints not illustrated by other operators in this file
Operators are collected from the following resources:
- Python pandas
- R dplyr cheatsheets
- R tibbles
- R Tidy data
- Julia DataFrames
- LINQ
- MySQL
- PostgreSQL
- Pyret taught in Brown CS111
- Pyret taught in the Bootstrap DS
- Compare Python pandas with R TidyVerse
- Compare Python pandas with SQL
- Compare Julia DataFrame with Python pandas and R TidyVerse
For our convenience, we sometimes apply table operators to rows (e.g. selectColumns(r, ["foo", "bar"])). A implementation of Table API can either view rows as a subtype of tables, overload those operators, or give different names to row variants of the operators.
Column names must be first-class and manufacturable to support the full B2T2 specification. This API and the example programs assume that column names behave like strings to keep the specification simple. However, other designs are possible.
Required column operations:
concat: append two column namescolNameOfNumber: convert aNumberto aColNamesplit: divide a column name into pieces (used to implementstartsWith)
even: consumes an integer and returns a booleanlength: consumes a sequence and measures its lengthschema: extracts the schema of a tablesubTable: extracts a combination of rows (selectRows) and columns (selectColumns) from a tablerange: consumes a number and produces a sequence of valid indicesconcat: concatenates two sequences or two stringsstartsWith: checks whether a string starts with another stringaverage: computes the average of a sequence of numbersfilter: the conventional sequence (e.g. lists) filtermap: the conventional sequence (e.g. lists) mapremoveDuplicates: consumes a sequence and produces a subsequence with all duplicated elements removedremoveAll: consumes two sequences and produces a subsequence of the first input, removing all elements that also appear in the second input
xhas no duplicatesxis equal toyxis (not) inyxis a subsequence ofyxis of sortyxisyxis a categorical sortxis (non-)negativexis equal to the sort ofyxis the sort of elements ofyxis equal toywith alla_ireplaced withb_i
schema(t)is equal to[]nrows(t)is equal to0
Create an empty table.
- for all
rinrs,schema(r)is equal toschema(t1)
schema(t2)is equal toschema(t1)nrows(t2)is equal tonrows(t1) + length(rs)
Consumes a Table and a sequence of Row to add, and produces a new Table with the rows from the original table followed by the given Rows.
> addRows(
students,
[
[row:
("name", "Colton"), ("age", 19),
("favorite color", "blue")]
])
| name | age | favorite color |
| -------- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Alice" | 17 | "green" |
| "Eve" | 13 | "red" |
| "Colton" | 19 | "blue" |
> addRows(gradebook, [])
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |cis not inheader(t1)length(vs)is equal tonrows(t1)
header(t2)is equal toconcat(header(t1), [c])- for all
c'inheader(t1),schema(t2)[c']is equal toschema(t1)[c'] schema(t2)[c]is the sort of elements ofvsnrows(t2)is equal tonrows(t1)
Consumes a column name and a Seq of values and produces a new Table with the columns of the input Table followed by a column with the given name and values. Note that the length of vs must equal the length of the Table.
> hairColor = ["brown", "red", "blonde"]
> addColumn(students, "hair-color", hairColor)
| name | age | favorite color | hair-color |
| ------- | --- | -------------- | ---------- |
| "Bob" | 12 | "blue" | "brown" |
| "Alice" | 17 | "green" | "red" |
| "Eve" | 13 | "red" | "blonde" |
> presentation = [9, 9, 6]
> addColumn(gradebook, "presentation", presentation)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final | presentation |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- | ------------ |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 | 9 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 | 9 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 | 6 |cis not inheader(t1)
schema(r)is equal toschema(t1)header(t2)is equal toconcat(header(t1), [c])- for all
c'inheader(t1),schema(t2)[c']is equal toschema(t1)[c'] schema(t2)[c]is equal to the sort ofvnrows(t2)is equal tonrows(t1)
Consumes an existing Table and produces a new Table containing an additional column with the given ColName, using f to compute the values for that column, once for each row.
> isTeenagerBuilder =
function(r):
12 < getValue(r, "age") and getValue(r, "age") < 20
end
> buildColumn(students, "is-teenager", isTeenagerBuilder)
| name | age | favorite color | is-teenager |
| ------- | --- | -------------- | ----------- |
| "Bob" | 12 | "blue" | false |
| "Alice" | 17 | "green" | true |
| "Eve" | 13 | "red" | true |
> didWellInFinal =
function(r):
85 <= getValue(r, "final")
end
> buildColumn(gradebook, "did-well-in-final", didWellInFinal)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final | did-well-in-final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- | ----------------- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 | true |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 | true |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 | false |schema(t1)is equal toschema(t2)
schema(t3)is equal toschema(t1)nrows(t3)is equal tonrows(t1) + nrows(t2)
Combines two tables vertically. The output table starts with rows from the first input table, followed by the rows from the second input table.
> increaseAge =
function(r):
[row: ("age", 1 + getValue(r, "age"))]
end
> vcat(students, update(students, increaseAge))
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Alice" | 17 | "green" |
| "Eve" | 13 | "red" |
| "Bob" | 13 | "blue" |
| "Alice" | 18 | "green" |
| "Eve" | 14 | "red" |
> curveMidtermAndFinal =
function(r):
curve =
function(n):
n + 5
end
[row:
("midterm", curve(getValue("midterm"))),
("final", curve(getValue("final")))]
end
> vcat(gradebook, update(gradebook, curveMidtermAndFinal))
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |
| "Bob" | 12 | 8 | 9 | 82 | 7 | 9 | 92 |
| "Alice" | 17 | 6 | 8 | 93 | 8 | 7 | 90 |
| "Eve" | 13 | 7 | 9 | 89 | 8 | 8 | 82 |concat(header(t1), header(t2))has no duplicatesnrows(t1)is equal tonrows(t2)
schema(t3)is equal toconcat(schema(t1), schema(t2))nrows(t3)is equal tonrows(t1)
Combines two tables horizontally. The output table starts with columns from the first input, followed by the columns from the second input.
> hcat(students, dropColumns(gradebook, ["name", "age"]))
| name | age | favorite color | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | -------------- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | "blue" | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | "green" | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | "red" | 7 | 9 | 84 | 8 | 8 | 77 |
> hcat(dropColumns(students, ["name", "age"]), gradebook)
| favorite color | name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| -------------- | ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "blue" | "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |
| "green" | "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "red" | "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |length(rs)is positive- for all
rinrs,schema(r)is equal toschema(rs[0])
schema(t)is equal toschema(rs[0])nrows(t)is equal tolength(rs)
Returns a sequence of one or more rows as a table.
> values([
[row: ("name", "Alice")],
[row: ("name", "Bob")]])
| name |
| ------- |
| "Alice" |
| "Bob" |
> values([
[row: ("name", "Alice"), ("age", 12)],
[row: ("name", "Bob"), ("age", 13)]])
| name | age |
| ------- | --- |
| "Alice" | 12 |
| "Bob" | 13 |concat(header(t1), header(t2))has no duplicates
schema(t3)is equal toconcat(schema(t1), schema(t2))nrows(t3)is equal tonrows(t1) * nrows(t2)
Computes the cartesian product of two tables.
> petiteJelly = subTable(jellyAnon, [0, 1], [0, 1, 2])
> petiteJelly
| get acne | red | black |
| -------- | ----- | ----- |
| true | false | false |
| true | false | true |
> crossJoin(students, petiteJelly)
| name | age | favorite color | get acne | red | black |
| ------- | --- | -------------- | -------- | ----- | ----- |
| "Bob" | 12 | "blue" | true | false | false |
| "Bob" | 12 | "blue" | true | false | true |
| "Alice" | 17 | "green" | true | false | false |
| "Alice" | 17 | "green" | true | false | true |
| "Eve" | 13 | "red" | true | false | false |
| "Eve" | 13 | "red" | true | false | true |
> crossJoin(emptyTable, petiteJelly)
| get acne | red | black |
| -------- | ----- | ----- |cshas no duplicates- for all
cincs,cis inheader(t1) - for all
cincs,cis inheader(t2) - for all
cincs,schema(t1)[c]is equal toschema(t2)[c] concat(header(t1), removeAll(header(t2), cs))has no duplicates
header(t3)is equal toconcat(header(t1), removeAll(header(t2), cs))- for all
cinheader(t1),schema(t3)[c]is equal toschema(t1)[c] - for all
cinremoveAll(header(t2), cs)),schema(t3)[c]is equal toschema(t2)[c] nrows(t3)is equal tonrows(t1)ifdistinct(selectColumns(t2, cs))is equal toselectColumns(t2, cs), otherwise each row oft1may have several matches
Looks up more information on rows of the first table and add those information to create a new table. The named columns define the keys for looking up. If there is no corresponding row in t2, the extra column will be filled with empty cells.
> leftJoin(students, gradebook, ["name", "age"])
| name | age | favorite color | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | -------------- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | "blue" | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | "green" | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | "red" | 7 | 9 | 84 | 8 | 8 | 77 |
> leftJoin(employees, departments, ["Department ID"])
| Last Name | Department ID | Department Name |
| ------------ | ------------- | --------------- |
| "Rafferty" | 31 | "Sales" |
| "Jones" | 32 | |
| "Heisenberg" | 33 | "Engineering" |
| "Robinson" | 34 | "Clerical" |
| "Smith" | 34 | "Clerical" |
| "Williams" | | |nis equal tonrows(t)
Returns a Number representing the number of rows in the Table.
> nrows(emptyTable)
0
> nrows(studentsMissing)
3nis equal toncols(t)
Returns a Number representing the number of columns in the Table.
> ncols(students)
3
> ncols(studentsMissing)
3csis equal toheader(t)
Returns a Seq representing the column names in the Table.
> header(students)
["name", "age", "favorite color"]
> header(gradebook)
["name", "age", "quiz1", "quiz2", "midterm", "quiz3", "quiz4", "final"]nis inrange(nrows(t))
Extracts a row out of a table by a numeric index.
> getRow(students, 0)
[row: ("name", "Bob"), ("age", 12), ("favorite color", "blue")]
> getRow(gradebook, 1)
[row:
("name", "Alice"), ("age", 17),
("quiz1", 6), ("quiz2", 8), ("midterm", 88),
("quiz3", 8), ("quiz4", 7), ("final", 85)]cis in header(r)
vis of sortschema(r)[c]
Retrieves the value for the column c in the row r.
> getValue([row: ("name", "Bob"), ("age", 12)], "name")
"Bob"
> getValue([row: ("name", "Bob"), ("age", 12)], "age")
12nis inrange(ncols(t))
length(vs)is equal tonrows(t)- for all
vinvs,vis of sortschema(t)[header(t)[n]]
Returns a Seq of the values in the indexed column in t.
> getColumn(students, 1)
[12, 17, 13]
> getColumn(gradebook, 0)
["Bob", "Alice", "Eve"]cis inheader(t)
- for all
vinvs,vis of sortschema(t)[c] length(vs)is equal tonrows(t)
Returns a Seq of the values in the named column in t.
> getColumn(students, "age")
[12, 17, 13]
> getColumn(gradebook, "name")
["Bob", "Alice", "Eve"]- for all
ninns,nis inrange(nrows(t1))
schema(t2)is equal toschema(t1)nrows(t2)is equal tolength(ns)
Given a Table and a Seq<Number> containing row indices, produces a new Table containing only those rows.
> selectRows(students, [2, 0, 2, 1])
| name | age | favorite color |
| ------- | --- | -------------- |
| "Eve" | 13 | "red" |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
| "Alice" | 17 | "green" |
> selectRows(gradebooks, [2, 1])
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |length(bs)is equal tonrows(t1)
schema(t2)is equal toschema(t1)nrows(t2)is equal tolength(removeAll(bs, [false]))
Given a Table and a Seq<Boolean> that represents a predicate on rows, returns a Table with only the rows for which the predicate returns true.
> selectRows(students, [true, false, true])
| name | age | favorite color |
| ----- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
> selectRows(gradebook, [false, false, true])
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ----- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |length(bs)is equal toncols(t1)
header(t2)is a subsequence ofheader(t1)- for all
iinrange(ncols(t1)),header(t1)[i]is inheader(t2)if and only ifbs[i]is equal totrue schema(t2)is a subsequence ofschema(t1)nrows(t2)is equal tonrows(t1)
Consumes a Table and a Seq<Boolean> deciding whether each column should be kept, and produces a new Table containing only those columns. The order of the columns is as in the input table.
> selectColumns(students, [true, true, false])
| name | age |
| ------- | --- |
| "Bob" | 12 |
| "Alice" | 17 |
| "Eve" | 13 |
> selectColumns(gradebook, [true, false, false, false, true, false, false, true])
| name | midterm | final |
| ------- | ------- | ----- |
| "Bob" | 77 | 87 |
| "Alice" | 88 | 85 |
| "Eve" | 84 | 77 |nshas no duplicates- for all
ninns,nis inrange(ncols(t1))
ncols(t2)is equal tolength(ns)- for all
iinrange(length(ns)),header(t2)[i]is equal toheader(t1)[ns[i]] - for all
cinheader(t2),schema(t2)[c]is equal toschema(t1)[c] nrows(t2)is equal tonrows(t1)
Consumes a Table and a Seq<Number> containing column indices, and produces a new Table containing only those columns. The order of the columns is as given in the input Seq.
> selectColumns(students, [2, 1])
| favorite color | age |
| -------------- | --- |
| "blue" | 12 |
| "green" | 17 |
| "red" | 13 |
> selectColumns(gradebook, [7, 0, 4])
| final | name | midterm |
| ----- | ------- | ------- |
| 87 | "Bob" | 77 |
| 85 | "Alice" | 88 |
| 77 | "Eve" | 84 |cshas no duplicates- for all
cincs,cis inheader(t1)
header(t2)is equal tocs- for all
cinheader(t2),schema(t2)[c]is equal toschema(t1)[c] nrows(t2)is equal tonrows(t1)
Consumes a Table and a Seq<ColName> containing column names, and produces a new Table containing only those columns. The order of the columns is as given in the input Seq.
> selectColumns(students, ["favorite color", "age"])
| favorite color | age |
| -------------- | --- |
| "blue" | 12 |
| "green" | 17 |
| "red" | 13 |
> selectColumns(gradebook, ["final", "name", "midterm"])
| final | name | midterm |
| ----- | ------- | ------- |
| 87 | "Bob" | 77 |
| 85 | "Alice" | 88 |
| 77 | "Eve" | 84 |- if
nis non-negative thennis inrange(nrows(t1)) - if
nis negative then- nis inrange(nrows(t1))
schema(t2)is equal toschema(t1)- if
nis non-negative thennrows(t2)is equal ton - if
nis negative thennrows(t2)is equal tonrows(t1) + n
Returns the first n rows of the table based on position. For negative values of n, this function returns all rows except the last n rows.
> head(students, 1)
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |
> head(students, -2)
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |schema(t2)is equal toschema(t1)
Retains only unique/distinct rows from an input Table.
> distinct(students)
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Alice" | 17 | "green" |
| "Eve" | 13 | "red" |
> distinct(selectColumns(gradebook, ["quiz3"]))
| quiz3 |
| ----- |
| 7 |
| 8 |cis inheader(t1)
nrows(t2)is equal tonrows(t1)header(t2)is equal toremoveAll(header(t1), [c])schema(t2)is a subsequence ofschema(t1)
Returns a Table that is the same as t, except without the named column.
> dropColumn(students, "age")
| name | favorite color |
| ------- | -------------- |
| "Bob" | "blue" |
| "Alice" | "green" |
| "Eve" | "red" |
> dropColumn(gradebook, "final")
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 |
| ------- | --- | ----- | ----- | ------- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 |- for all
cincs,cis inheader(t1) cshas no duplicates
nrows(t2)is equal tonrows(t1)header(t2)is equal toremoveAll(header(t1), cs)schema(t2)is a subsequence ofschema(t1)
Returns a Table that is the same as t, except without the named columns.
> dropColumns(students, ["age"])
| name | favorite color |
| ------- | -------------- |
| "Bob" | "blue" |
| "Alice" | "green" |
| "Eve" | "red" |
> dropColumns(gradebook, ["final", "midterm"])
| name | age | quiz1 | quiz2 | quiz3 | quiz4 |
| ------- | --- | ----- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 7 | 9 |
| "Alice" | 17 | 6 | 8 | 8 | 7 |
| "Eve" | 13 | 7 | 9 | 8 | 8 |schema(r)is equal toschema(t1)schema(t2)is equal toschema(t1)
Given a Table and a predicate on rows, returns a Table with only the rows for which the predicate returns true.
> ageUnderFifteen =
function(r):
getValue(r, "age") < 15
end
> tfilter(students, ageUnderFifteen)
| name | age | favorite color |
| ----- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
> nameLongerThan3Letters =
function(r):
length(getValue(r, "name")) > 3
end
> tfilter(gradebook, nameLongerThan3Letters)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |cis inheader(t1)schema(t1)[c]isNumber
nrows(t2)is equal tonrows(t1)schema(t2)is equal toschema(t1)
Given a Table and one of its column names, returns a Table with the same rows ordered based on the named column. If b is true, the Table will be sorted in ascending order, otherwise it will be in descending order.
> tsort(students, "age", true)
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
| "Alice" | 17 | "green" |
> tsort(gradebook, "final", false)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |cshas no duplicates- for all
cincs,cis inheader(t1) - for all
cincs,schema(t1)[c]isNumber
nrows(t2)is equal tonrows(t1)schema(t2)is equal toschema(t1)
Given a Table and a sequence of column names in that Table, return a Table with the same rows ordered ascendingly based on the named columns.
> sortByColumns(students, ["age"])
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
| "Alice" | 17 | "green" |
> sortByColumns(gradebook, ["quiz2", "quiz1"])
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |orderBy :: t1:Table * Seq<Exists K . getKey:(r:Row -> k:K) * compare:(k1:K * k2:K -> Boolean)> -> t2:Table
schema(r)is equal toschema(t1)schema(t2)is equal toschema(t1)nrows(t2)is equal tonrows(t1)
Sorts the rows of a Table in ascending order by using a sequence of specified comparers.
> nameLength =
function(r):
length(getValue(r, "name"))
end
> le =
function(n1, n2):
n1 <= n2
end
> orderBy(students, [(nameLength, le)])
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
| "Alice" | 17 | "green" |
> midtermAndFinal =
function(r):
[getValue(r, "midterm"), getValue(r, "final")]
end
> compareGrade =
function(g1, g2):
le(average(g1), average(g2))
end
> orderBy(gradebook, [(nameLength, ge), (midtermAndFinal, compareGrade)])
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |cis inheader(t1)schema(t1)[c]is a categorical sort
header(t2)is equal to["value", "count"]schema(t2)["value"]is equal toschema(t1)[c]schema(t2)["count"]is equal toNumbernrows(t2)is equal tolength(removeDuplicates(getColumn(t1, c)))Note that if there are missing values in the input, this constraint requires one row for missing values in the output.
Takes a Table and a ColName representing the name of a column in that Table. Produces a Table that summarizes how many rows have each value in the given column.
> count(students, "favorite color")
| value | count |
| ------- | ----- |
| "blue" | 1 |
| "green" | 1 |
| "red" | 1 |
> count(gradebook, "age")
| value | count |
| ----- | ----- |
| 12 | 1 |
| 17 | 1 |
| 13 | 1 |cis inheader(t1)schema(t1)[c]isNumber
header(t2)is equal to["group", "count"]schema(t2)["group"]isStringschema(t2)["count"]isNumber
Groups the values of a numeric column into bins. The parameter n specifies the bin width. This function is useful in creating histograms and converting continuous random variables to categorical ones.
> bin(students, "age", 5)
| group | count |
| ---------------- | ----- |
| "10 <= age < 15" | 2 |
| "15 <= age < 20" | 1 |
> bin(gradebook, "final", 5)
| group | count |
| ---------------- | ----- |
| "75 <= age < 80" | 1 |
| "80 <= age < 85" | 0 |
| "85 <= age < 90" | 2 |Let ci1 and ci2 and fi be the components of aggs[i] for all i in range(length(aggs))
- for all
cincs,cis inheader(t1) - for all
cincs,schema(t1)[c]is a categorical sort ci2is inheader(t1)concat(cs, [c11, ... , cn1])has no duplicates
ficonsumesSeq<schema(t1)[ci2]>header(t2)is equal toconcat(cs, [c11, ... , cn1])- for all
cincs,schema(t2)[c]is equal toschema(t1)[c] schema(t2)[ci1]is equal to the sort of outputs offifor alli
Partitions rows into groups and summarize each group with the functions in agg. Each element of agg specifies the output column, the input column, and the function that compute the summarizing value (e.g. average, sum, and count).
> pivotTable(students, ["favorite color"], [("age-average", "age", average)])
| favorite color | age-average |
| -------------- | ----------- |
| "blue" | 12 |
| "green" | 17 |
| "red" | 13 |
> proportion =
function(bs):
n = length(filter(bs, function(b): b end))
n / length(bs)
end
> pivotTable(
jellyNamed,
["get acne", "brown"],
[
("red proportion", "red", proportion),
("pink proportion", "pink", proportion)
])
| get acne | brown | red proportion | pink proportion |
| -------- | ----- | -------------- | --------------- |
| false | false | 0 | 3/4 |
| false | true | 1 | 1 |
| true | false | 0 | 1/4 |
| true | true | 0 | 0 |groupBy<K,V> :: t1:Table * key:(r1:Row -> k1:K) * project:(r2:Row -> v:V) * aggregate:(k2:K * vs:Seq<V> -> r3:Row) -> t2:Table
schema(r1)is equal toschema(t1)schema(r2)is equal toschema(t1)schema(t2)is equal toschema(r3)nrows(t2)is equal tolength(removeDuplicates(ks)), whereksis the results of applyingkeyto each row oft1.kscan be defined withselectandgetColumn.
Note that these constraints assume a first class representation for missing values.
Groups the rows of a table according to a specified key selector function and creates a result value from each group and its key. The rows of each group are projected by using a specified function.
> colorTemp =
function(r):
if getValue(r, "favorite color") == "red":
"warm"
else:
"cool"
end
end
> nameLength =
function(r):
length(getValue(r, "name"))
end
> aggregate =
function(k, vs):
[row: ("key", k), ("average", average(vs))]
end
> groupBy(students, colorTemp, nameLength, aggregate)
| key | average |
| ------ | ------- |
| "warm" | 3 |
| "cool" | 4 |
> abstractAge =
function(r):
if (getValue(r, "age") <= 12):
"kid"
else if (getValue(r, "age") <= 19):
"teenager"
else:
"adult"
end
end
> finalGrade =
function(r):
getValue(r, "final")
end
> groupBy(gradebook, abstractAge, finalGrade, aggregate)
| key | average |
| ---------- | ------- |
| "kid" | 87 |
| "teenager" | 81 |cis inheader(t)
length(bs)is equal tonrows(t)
Return a Seq<Boolean> with true entries indicating rows without missing values (complete cases) in table t.
> completeCases(students, "age")
[true, true, true]
> completeCases(studentsMissing, "age")
[false, true, true]schema(t2)is equal toschema(t1)
Removes rows that have some values missing
> dropna(studentsMissing)
| name | age | favorite color |
| ------- | --- | -------------- |
| "Alice" | 17 | "green" |
> dropna(gradebookMissing)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ----- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |cis inheader(t1)vis of sortschema(t1)[c]
schema(t2)is equal toschema(t1)nrows(t2)is equal tonrows(t1)
Scans the named column and fills in v when a cell is missing value.
> fillna(studentsMissing, "favorite color", "white")
| name | age | favorite color |
| ------- | --- | -------------- |
| "Bob" | | "blue" |
| "Alice" | 17 | "green" |
| "Eve" | 13 | "white" |
> fillna(gradebookMissing, "quiz1", 0)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | 6 | 8 | 88 | | 7 | 85 |
| "Eve" | 13 | 0 | 9 | 84 | 8 | 8 | 77 |length(cs)is positivecshas no duplicates- for all
cincs,cis inheader(t1) - for all
cincs,schema(t1)[c]is equal toschema(t1)[cs[0]] concat(removeAll(header(t1), cs), [c1, c2])has no duplicates
header(t2)is equal toconcat(removeAll(header(t1), cs), [c1, c2])- for all
cinremoveAll(header(t1), cs),schema(t2)[c]is equal toschema(t1)[c] schema(t2)[c1]is equal toColNameschema(t2)[c2]is equal toschema(t1)[cs[0]]
Reshapes the input table and make it longer. The data kept in the named columns are moved to two new columns, one for the column names and the other for the cell values.
> pivotLonger(gradebook, ["midterm", "final"], "exam", "score")
| name | age | quiz1 | quiz2 | quiz3 | quiz4 | exam | score |
| ------- | --- | ----- | ----- | ----- | ----- | --------- | ----- |
| "Bob" | 12 | 8 | 9 | 7 | 9 | "midterm" | 77 |
| "Bob" | 12 | 8 | 9 | 7 | 9 | "final" | 87 |
| "Alice" | 17 | 6 | 8 | 8 | 7 | "midterm" | 88 |
| "Alice" | 17 | 6 | 8 | 8 | 7 | "final" | 85 |
| "Eve" | 13 | 7 | 9 | 8 | 8 | "midterm" | 84 |
| "Eve" | 13 | 7 | 9 | 8 | 8 | "final" | 77 |
> pivotLonger(gradebook, ["quiz1", "quiz2", "quiz3", "quiz4", "midterm", "final"], "test", "score")
| name | age | test | score |
| ------- | --- | --------- | ----- |
| "Bob" | 12 | "quiz1" | 8 |
| "Bob" | 12 | "quiz2" | 9 |
| "Bob" | 12 | "quiz3" | 7 |
| "Bob" | 12 | "quiz4" | 9 |
| "Bob" | 12 | "midterm" | 77 |
| "Bob" | 12 | "final" | 87 |
| "Alice" | 17 | "quiz1" | 6 |
| "Alice" | 17 | "quiz2" | 8 |
| "Alice" | 17 | "quiz3" | 8 |
| "Alice" | 17 | "quiz4" | 7 |
| "Alice" | 17 | "midterm" | 88 |
| "Alice" | 17 | "final" | 85 |
| "Eve" | 13 | "quiz1" | 7 |
| "Eve" | 13 | "quiz2" | 9 |
| "Eve" | 13 | "quiz3" | 8 |
| "Eve" | 13 | "quiz4" | 8 |
| "Eve" | 13 | "midterm" | 84 |
| "Eve" | 13 | "final" | 77 |c1is inheader(t1)c2is inheader(t1)schema(t1)[c1]isColNameconcat(removeAll(header(t1), [c1, c2]), removeDuplicates(getColumn(t1, c1)))has no duplicates
header(t2)is equal toconcat(removeAll(header(t1), [c1, c2]), removeDuplicates(getColumn(t1, c1)))- for all
cinremoveAll(header(t1), [c1, c2]),schema(t2)[c]is equal toschema(t1)[c] - for all
cinremoveDuplicates(getColumn(t1, c1)),schema(t2)[c]is equal toschema(t1)[c2]
The inverse of pivotLonger.
> pivotWider(students, "name", "age")
| favorite color | Bob | Alice | Eve |
| -------------- | --- | ----- | --- |
| "blue" | 12 | | |
| "green" | | 17 | |
| "red" | | | 13 |
> longerTable =
pivotLonger(
gradebook,
["quiz1", "quiz2", "quiz3", "quiz4", "midterm", "final"],
"test",
"score")
> pivotWider(longerTable, "test", "score")
| name | age | quiz1 | quiz2 | quiz3 | quiz4 | midterm | final |
| ------- | --- | ----- | ----- | ----- | ----- | ------- | ----- |
| "Bob" | 12 | 8 | 9 | 7 | 9 | 77 | 87 |
| "Alice" | 17 | 6 | 8 | 8 | 7 | 88 | 85 |
| "Eve" | 13 | 7 | 9 | 8 | 8 | 84 | 77 |cshas no duplicates- for all
cincs,cis inheader(t1) - for all
cincs,schema(t1)[c]isSeq<X>for some sortX - for all
iinrange(nrows(t1)), for allc1andc2incs,length(getValue(getRow(t1, i), c1))is equal tolength(getValue(getRow(t1, i), c2))
header(t2)is equal toheader(t1)- for all
cinheader(t2)- if
cis incsthenschema(t2)[c]is equal to the element sort ofschema(t1)[c] - otherwise,
schema(t2)[c]is equal toschema(t1)[c]
- if
When columns cs of table t have sequences, returns a Table where each element of each c in cs is flattened, meaning the column corresponding to c becomes a longer column where the original entries are concatenated. If all sequences to be flattened are empty, the behavior is unspecified. Elements of row i of t in columns other than cs will be repeated according to the length of getValue(getRow(t1, i), c1). These lengths must therefore be the same for each c in cs.
> flatten(gradebookSeq, ["quizzes"])
| name | age | quizzes | midterm | final |
| ------- | --- | ------- | ------- | ----- |
| "Bob" | 12 | 8 | 77 | 87 |
| "Bob" | 12 | 9 | 77 | 87 |
| "Bob" | 12 | 7 | 77 | 87 |
| "Bob" | 12 | 9 | 77 | 87 |
| "Alice" | 17 | 6 | 88 | 85 |
| "Alice" | 17 | 8 | 88 | 85 |
| "Alice" | 17 | 8 | 88 | 85 |
| "Alice" | 17 | 7 | 88 | 85 |
| "Eve" | 13 | 7 | 84 | 77 |
| "Eve" | 13 | 9 | 84 | 77 |
| "Eve" | 13 | 8 | 84 | 77 |
| "Eve" | 13 | 8 | 84 | 77 |
> t = buildColumn(gradebookSeq, "quiz-pass?",
function(r):
isPass =
function(n):
n >= 8
end
map(getValue(r, "quizzes"), isPass)
end)
> t
| name | age | quizzes | midterm | final | quiz-pass? |
| ------- | --- | ------------ | ------- | ----- | -------------------------- |
| "Bob" | 12 | [8, 9, 7, 9] | 77 | 87 | [true, true, false, true] |
| "Alice" | 17 | [6, 8, 8, 7] | 88 | 85 | [false, true, true, false] |
| "Eve" | 13 | [7, 9, 8, 8] | 84 | 77 | [false, true, true, true] |
> flatten(t, ["quiz-pass?", "quizzes"])
| name | age | quizzes | midterm | final | quiz-pass? |
| ------- | --- | ------- | ------- | ----- | ---------- |
| "Bob" | 12 | 8 | 77 | 87 | true |
| "Bob" | 12 | 9 | 77 | 87 | true |
| "Bob" | 12 | 7 | 77 | 87 | false |
| "Bob" | 12 | 9 | 77 | 87 | true |
| "Alice" | 17 | 6 | 88 | 85 | false |
| "Alice" | 17 | 8 | 88 | 85 | true |
| "Alice" | 17 | 8 | 88 | 85 | true |
| "Alice" | 17 | 7 | 88 | 85 | false |
| "Eve" | 13 | 7 | 84 | 77 | false |
| "Eve" | 13 | 9 | 84 | 77 | true |
| "Eve" | 13 | 8 | 84 | 77 | true |
| "Eve" | 13 | 8 | 84 | 77 | true |cis inheader(t1)
v1is of sortschema(t1)[c]header(t2)is equal toheader(t1)- for all
c'inheader(t2),- if
c'is equal tocthenschema(t2)[c']is equal to the sort ofv2 - otherwise, then
schema(t2)[c']is equal toschema(t1)[c']
- if
nrows(t2)is equal tonrows(t1)
Consumes a Table, a ColName representing a column name, and a transformation function and produces a new Table where the transformation function has been applied to all values in the named column.
> addLastName =
function(name):
concat(name, " Smith")
end
> transformColumn(students, "name", addLastName)
| name | age | favorite color |
| ------------- | --- | -------------- |
| "Bob Smith" | 12 | "blue" |
| "Alice Smith" | 17 | "green" |
| "Eve Smith" | 13 | "red" |
> quizScoreToPassFail =
function(score):
if score <= 6:
"fail"
else:
"pass"
end
end
> transformColumn(gradebook, "quiz1", quizScoreToPassFail)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ------ | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | "pass" | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | "fail" | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | "pass" | 9 | 84 | 8 | 8 | 77 |Let n be the length of ccs Let c11 ... c1n be the first components of the elements of ccs and c21 ... c2n be the second components.
c1iis inheader(t1)for alli[c11 ... c1n]has no duplicatesconcat(removeAll(header(t1), [c11 ... c1n]), [c21 ... c2n])has no duplicates
header(t2)is equal toheader(t1)with allc1ireplaced withc2i- for all
cinheader(t2),- if
cis equal toc2ifor someithenschema(t2)[c2i]is equal toschema(t1)[c1i] - otherwise,
schema(t2)[c]is equal toschema(t2)[c]
- if
nrows(t2)is equal tonrows(t1)
Updates column names. Each element of ccs specifies the old name and the new name.
> renameColumns(students, [("favorite color", "preferred color"), ("name", "first name")])
| first name | age | preferred color |
| ---------- | --- | --------------- |
| "Bob" | 12 | "blue" |
| "Alice" | 17 | "green" |
| "Eve" | 13 | "red" |
> renameColumns(gradebook, [("midterm", "final"), ("final", "midterm")])
| name | age | quiz1 | quiz2 | final | quiz3 | quiz4 | midterm |
| ------- | --- | ----- | ----- | ----- | ----- | ----- | ------- |
| "Bob" | 12 | 8 | 9 | 77 | 7 | 9 | 87 |
| "Alice" | 17 | 6 | 8 | 88 | 8 | 7 | 85 |
| "Eve" | 13 | 7 | 9 | 84 | 8 | 8 | 77 |- for all
cinheader(r),cis inheader(t) - for all
cinheader(r),schema(r)[c]is equal toschema(t)[c]
- either
nis equal toerror("not found")ornis inrange(nrows(t))
Find the index of the first row that matches r.
> find(students, [row: ("age", 13)])
2
> find(students, [row: ("age", 14)])
error("not found")cis inheader(t1)schema(t1)[c]is a categorical sort
header(t2)is equal to["key", "groups"]schema(t2)["key"]is equal toschema(t1)[c]schema(t2)["groups"]isTablegetColumn(t2, "key")has no duplicates- for all
tingetColumn(t2, "groups"),schema(t)is equal toschema(t1) nrows(t2)is equal tolength(removeDuplicates(getColumn(t1, c)))
Categorizes rows of the input table into groups by the key of each row. The key is computed by accessing the named column.
> groupByRetentive(students, "favorite color")
| key | groups |
| ------- | ---------------------------------- |
| "blue" | | name | age | favorite color | |
| | | ------- | --- | -------------- | |
| | | "Bob" | 12 | "blue" | |
| "green" | | name | age | favorite color | |
| | | ------- | --- | -------------- | |
| | | "Alice" | 17 | "green" | |
| "red" | | name | age | favorite color | |
| | | ------- | --- | -------------- | |
| | | "Eve" | 13 | "red" | |
> groupByRetentive(jellyAnon, "brown")
| key | groups |
| ----- | --------------------------------------------------------------------------------------- |
| false | | get acne | red | black | white | green | yellow | brown | orange | pink | purple | |
| | | -------- | ----- | ----- | ----- | ----- | ------ | ----- | ------ | ----- | ------ | |
| | | true | false | false | false | true | false | false | true | false | false | |
| | | true | false | true | false | true | true | false | false | false | false | |
| | | false | false | false | false | true | false | false | false | true | false | |
| | | false | false | false | false | false | true | false | false | false | false | |
| | | false | false | false | false | false | true | false | false | true | false | |
| | | true | false | true | false | false | false | false | true | true | false | |
| | | false | false | true | false | false | false | false | false | true | false | |
| | | true | false | false | false | false | false | false | true | false | false | |
| true | | get acne | red | black | white | green | yellow | brown | orange | pink | purple | |
| | | -------- | ----- | ----- | ----- | ----- | ------ | ----- | ------ | ----- | ------ | |
| | | true | false | false | false | false | false | true | true | false | false | |
| | | false | true | false | false | false | true | true | false | true | false | |cis inheader(t1)schema(t1)[c]is a categorical sort
header(t2)is equal to["key", "groups"]schema(t2)["key"]is equal toschema(t1)[c]schema(t2)["groups"]isTablegetColumn(t2, "key")has no duplicates- for all
tingetColumn(t2, "groups"),header(t)is equal toremoveAll(header(t1), [c]) - for all
tingetColumn(t2, "groups"),schema(t)is a subsequence ofschema(t1) nrows(t2)is equal tolength(removeDuplicates(getColumn(t1, c)))
Similar to groupByRetentive but the named column is removed in the output.
> groupBySubtractive(students, "favorite color")
| key | groups |
| ------- | ----------------- |
| "blue" | | name | age | |
| | | ------- | --- | |
| | | "Bob" | 12 | |
| "green" | | name | age | |
| | | ------- | --- | |
| | | "Alice" | 17 | |
| "red" | | name | age | |
| | | ------- | --- | |
| | | "Eve" | 13 | |
> groupBySubtractive(jellyAnon, "brown")
| key | groups |
| ----- | ------------------------------------------------------------------------------- |
| false | | get acne | red | black | white | green | yellow | orange | pink | purple | |
| | | -------- | ----- | ----- | ----- | ----- | ------ | ------ | ----- | ------ | |
| | | true | false | false | false | true | false | true | false | false | |
| | | true | false | true | false | true | true | false | false | false | |
| | | false | false | false | false | true | false | false | true | false | |
| | | false | false | false | false | false | true | false | false | false | |
| | | false | false | false | false | false | true | false | true | false | |
| | | true | false | true | false | false | false | true | true | false | |
| | | false | false | true | false | false | false | false | true | false | |
| | | true | false | false | false | false | false | true | false | false | |
| true | | get acne | red | black | white | green | yellow | orange | pink | purple | |
| | | -------- | ----- | ----- | ----- | ----- | ------ | ------ | ----- | ------ | |
| | | true | false | false | false | false | false | true | false | false | |
| | | false | true | false | false | false | true | false | true | false | |- for all
cinheader(r2),cis inheader(t1)
schema(r1)is equal toschema(t1)header(t2)is equal toheader(t1)- for all
cinheader(t2)- if
cinheader(r2)thenschema(t2)[c]is equal toschema(r2)[c] - otherwise,
schema(t2)[c]is equal toschema(t1)[c]
- if
nrows(t2)is equal tonrows(t1)
Consumes an existing Table and produces a new Table with the named columns updated, using f to produce the values for those columns, once for each row.
> abstractAge =
function(r):
if (getValue(r, "age") <= 12):
[row: ("age", "kid")]
else if (getValue(r, "age") <= 19):
[row: ("age", "teenager")]
else:
[row: ("age", "adult")]
end
end
> update(students, abstractAge)
| name | age | favorite color |
| ------- | ---------- | -------------- |
| "Bob" | "kid" | "blue" |
| "Alice" | "teenager" | "green" |
| "Eve" | "teenager" | "red" |
> didWellInFinal =
function(r):
[row:
("midterm", 85 <= getValue(r, "midterm"))
("final", 85 <= getValue(r, "final"))]
end
> update(gradebook, didWellInFinal)
| name | age | quiz1 | quiz2 | midterm | quiz3 | quiz4 | final |
| ------- | --- | ----- | ----- | ------- | ----- | ----- | ----- |
| "Bob" | 12 | 8 | 9 | false | 7 | 9 | true |
| "Alice" | 17 | 6 | 8 | true | 8 | 7 | true |
| "Eve" | 13 | 7 | 9 | false | 8 | 8 | false |schema(r1)is equal toschema(t1)nis inrange(nrows(t1))schema(t2)is equal toschema(r2)nrows(t2)is equal tonrows(t1)
Projects each Row of a Table into a new Table.
> select(
students,
function(r, n):
[row:
("ID", n),
("COLOR", getValue(r, "favorite color")),
("AGE", getValue(r, "age"))]
end)
| ID | COLOR | AGE |
| -- | ------- | --- |
| 0 | "blue" | 12 |
| 1 | "green" | 17 |
| 2 | "red" | 13 |
> select(
gradebook,
function(r, n):
[row:
("full name", concat(getValue(r, "name"), " Smith")),
("(midterm + final) / 2", (getValue(r, "midterm") + getValue(r, "final")) / 2)]
end)
| full name | (midterm + final) / 2 |
| ------------- | --------------------- |
| "Bob Smith" | 82 |
| "Alice Smith" | 86.5 |
| "Eve Smith" | 80.5 |selectMany :: t1:Table * project:(r1:Row * n:Number -> t2:Table) * result:(r2:Row * r3:Row -> r4:Row) -> t3:Table
schema(r1)is equal toschema(t1)nis inrange(nrows(t1))schema(r2)is equal toschema(t1)schema(r3)is equal toschema(t2)schema(r4)is equal toschema(t3)
Projects each row of a table to a new table, flattens the resulting tables into one table, and invokes a result selector function on each row therein. The index of each source row is used in the intermediate projected form of that row.
> selectMany(
students,
function(r, n):
if even(n):
r
else:
head(r, 0)
end
end,
function(r1, r2):
r2
end)
| name | age | favorite color |
| ----- | --- | -------------- |
| "Bob" | 12 | "blue" |
| "Eve" | 13 | "red" |
> repeatRow =
function(r, n):
if n == 0:
r
else:
addRows(repeatRow(r, n - 1), [r])
end
end
> selectMany(
gradebook,
repeatRow,
function(r1, r2):
selectColumns(r2, ["midterm"])
end)
| midterm |
| ------- |
| 77 |
| 88 |
| 88 |
| 84 |
| 84 |
| 84 |groupJoin<K> :: t1:Table * t2:Table * getKey1:(r1:Row -> k1:K) * getKey2:(r2:Row -> k2:K) * aggregate:(r3:Row * t3:Table -> r4:Row) -> t4:Table
schema(r1)is equal toschema(t1)schema(r2)is equal toschema(t2)schema(r3)is equal toschema(t1)schema(t3)is equal toschema(t2)schema(t4)is equal toschema(r4)nrows(t4)is equal tonrows(t1)
Correlates the rows of two tables based on equality of keys and groups the results.
> getName =
function(r):
getValue(r, "name")
end
> averageFinal =
function(r, t):
addColumn(r, "final", [average(getColumn(t, "final"))])
end
> groupJoin(students, gradebook, getName, getName, averageFinal)
| name | age | favorite color | final |
| ------- | --- | -------------- | ----- |
| "Bob" | 12 | "blue" | 87 |
| "Alice" | 17 | "green" | 85 |
| "Eve" | 13 | "red" | 77 |
> nameLength =
function(r):
length(getValue(r, "name"))
end
> tableNRows =
function(r, t):
addColumn(r, "nrows", [nrows(t)])
end
> groupJoin(students, gradebook, nameLength, nameLength, tableNRows)
| name | age | favorite color | nrows |
| ------- | --- | -------------- | ----- |
| "Bob" | 12 | "blue" | 2 |
| "Alice" | 17 | "green" | 1 |
| "Eve" | 13 | "red" | 2 |join<K> :: t1:Table * t2:Table * getKey1:(r1:Row -> k1:K) * getKey2:(r2:Row -> k2:K) * combine:(r3:Row * r4:Row -> r5:Row) -> t3:Table
schema(r1)is equal toschema(t1)schema(r2)is equal toschema(t2)schema(r3)is equal toschema(t1)schema(r4)is equal toschema(t2)schema(t3)is equal toschema(r5)
Correlates the rows of two tables based on matching keys.
> getName =
function(r):
getValue(r, "name")
end
> addGradeColumn =
function(r1, r2):
addColumn(r1, "grade", [getValue(r2, "final")])
end
> join(students, gradebook, getName, getName, addGradeColumn)
| name | age | favorite color | grade |
| ------- | --- | -------------- | ----- |
| "Bob" | 12 | "blue" | 87 |
| "Alice" | 17 | "green" | 85 |
| "Eve" | 13 | "red" | 77 |
> nameLength =
function(r):
length(getValue(r, "name"))
end
> join(students, gradebook, nameLength, nameLength, addGradeColumn)
| name | age | favorite color | grade |
| ------- | --- | -------------- | ----- |
| "Bob" | 12 | "blue" | 87 |
| "Bob" | 12 | "blue" | 77 |
| "Alice" | 17 | "green" | 85 |
| "Eve" | 13 | "red" | 87 |
| "Eve" | 13 | "red" | 77 |