33
44Return the total number of observations contained in `data`.
55
6- If `data` does not have `numobs` defined, then this function
7- falls back to `length(data)`.
6+ If `data` does not have `numobs` defined,
7+ then in the case of `Tables.table(data) == true`
8+ returns the number of rows, otherwise returns `length(data)`.
9+
810Authors of custom data containers should implement
911`Base.length` for their type instead of `numobs`.
1012`numobs` should only be implemented for types where there is a
1113difference between `numobs` and `Base.length`
1214(such as multi-dimensional arrays).
1315
14- See also [`getobs`](@ref)
16+ `getobs` supports by default nested combinations of array, tuple,
17+ named tuples, and dictionaries.
18+
19+ See also [`getobs`](@ref).
20+
21+ # Examples
22+ ```jldoctest
23+
24+ # named tuples
25+ x = (a = [1, 2, 3], b = rand(6, 3))
26+ numobs(x) == 3
27+
28+ # dictionaries
29+ x = Dict(:a => [1, 2, 3], :b => rand(6, 3))
30+ numobs(x) == 3
31+ ```
32+ All internal containers must have the same number of observations:
33+ ```juliarepl
34+ julia> x = (a = [1, 2, 3, 4], b = rand(6, 3));
35+
36+ julia> numobs(x)
37+ ERROR: DimensionMismatch: All data containers must have the same number of observations.
38+ Stacktrace:
39+ [1] _check_numobs_error()
40+ @ MLUtils ~/.julia/dev/MLUtils/src/observation.jl:163
41+ [2] _check_numobs
42+ @ ~/.julia/dev/MLUtils/src/observation.jl:130 [inlined]
43+ [3] numobs(data::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Matrix{Float64}}})
44+ @ MLUtils ~/.julia/dev/MLUtils/src/observation.jl:177
45+ [4] top-level scope
46+ @ REPL[35]:1
47+ ```
1548"""
1649function numobs end
1750
1851# Generic Fallbacks
19- numobs (data) = length (data)
52+ @traitfn numobs (data:: X ) where {X; IsTable{X}} = DataAPI. nrow (data)
53+ @traitfn numobs (data:: X ) where {X; ! IsTable{X}} = length (data)
54+
2055
2156"""
2257 getobs(data, [idx])
2358
24- Return the observations corresponding to the observation- index `idx`.
59+ Return the observations corresponding to the observation index `idx`.
2560Note that `idx` can be any type as long as `data` has defined
26- `getobs` for that type.
61+ `getobs` for that type. If `idx` is not provided, then materialize
62+ all observations in `data`.
63+
64+ If `data` does not have `getobs` defined,
65+ then in the case of `Tables.table(data) == true`
66+ returns the row(s) in position `idx`, otherwise returns `data[idx]`.
2767
28- If `data` does not have `getobs` defined, then this function
29- falls back to `data[idx]`.
3068Authors of custom data containers should implement
3169`Base.getindex` for their type instead of `getobs`.
3270`getobs` should only be implemented for types where there is a
@@ -40,13 +78,37 @@ Every author behind some custom data container can make this
4078decision themselves.
4179The output should be consistent when `idx` is a scalar vs vector.
4280
43- See also [`getobs!`](@ref) and [`numobs`](@ref)
81+ `getobs` supports by default nested combinations of array, tuple,
82+ named tuples, and dictionaries.
83+
84+ See also [`getobs!`](@ref) and [`numobs`](@ref).
85+
86+ # Examples
87+
88+ ```jldoctest
89+ # named tuples
90+ x = (a = [1, 2, 3], b = rand(6, 3))
91+
92+ getobs(x, 2) == (a = 2, b = x.b[:, 2])
93+ getobs(x, [1, 3]) == (a = [1, 3], b = x.b[:, [1, 3]])
94+
95+
96+ # dictionaries
97+ x = Dict(:a => [1, 2, 3], :b => rand(6, 3))
98+
99+ getobs(x, 2) == Dict(:a => 2, :b => x[:b][:, 2])
100+ getobs(x, [1, 3]) == Dict(:a => [1, 3], :b => x[:b][:, [1, 3]])
101+ ```
44102"""
45103function getobs end
46104
47105# Generic Fallbacks
106+
48107getobs (data) = data
49- getobs (data, idx) = data[idx]
108+
109+ @traitfn getobs (data:: X , idx) where {X; IsTable{X}} = Tables. subset (data, idx, viewhint= false )
110+ @traitfn getobs (data:: X , idx) where {X; ! IsTable{X}} = data[idx]
111+
50112
51113"""
52114 getobs!(buffer, data, idx)
@@ -61,6 +123,8 @@ method is provided for the type of `data`, then `buffer` will be
61123because the type of `data` may not lend itself to the concept
62124of `copy!`. Thus, supporting a custom `getobs!` is optional
63125and not required.
126+
127+ See also [`getobs`](@ref) and [`numobs`](@ref).
64128"""
65129function getobs! end
66130# getobs!(buffer, data) = getobs(data)
@@ -161,3 +225,5 @@ function getobs!(buffers, data::Dict, i)
161225
162226 return buffers
163227end
228+
229+
0 commit comments