@@ -16,8 +16,6 @@ The dropout function. If `active` is `true`,
1616for each input, either sets that input to `0` (with probability
1717`p`) or scales it by `1 / (1 - p)`. `dims` specifies the unbroadcasted dimensions,
1818e.g. `dims=1` applies dropout along columns and `dims=2` along rows.
19- This is used as a regularisation, i.e. it reduces overfitting during training.
20-
2119If `active` is `false`, it just returns the input `x`.
2220
2321Specify `rng` for custom RNGs instead of the default RNG.
@@ -55,11 +53,16 @@ ChainRulesCore.@non_differentiable dropout_mask(::Any, ::Any, ::Any)
5553"""
5654 Dropout(p; dims=:, rng = rng_from_array())
5755
58- Dropout layer. In the forward pass, applies the [`Flux.dropout`](@ref) function on the input.
56+ Dropout layer.
57+
58+ While training, for each input, this layer either sets that input to `0` (with probability
59+ `p`) or scales it by `1 / (1 - p)`. To apply dropout along certain dimension(s), specify the
60+ `dims` keyword. e.g. `Dropout(p; dims = 3)` will randomly zero out entire channels on WHCN input
61+ (also called 2D dropout). This is used as a regularisation, i.e. it reduces overfitting during
62+ training.
5963
60- To apply dropout along certain dimension(s), specify the `dims` keyword.
61- e.g. `Dropout(p; dims = 3)` will randomly zero out entire channels on WHCN input
62- (also called 2D dropout).
64+ In the forward pass, this layer applies the [`Flux.dropout`](@ref) function. See that for more
65+ details.
6366
6467Specify `rng` to use a custom RNG instead of the default.
6568Custom RNGs are only supported on the CPU.
0 commit comments