[discussion] Improve usage of Random numbers in PCL

This is a meta-issue concerning the uses of random number generation facilities, as discussed in the Discord. Since several Boost.Random components are being replaced with their standard library counterparts (ex: #2803, #3994), I thought that it was a good time to open a general library-wide discussion about random handling in PCL.

General information: https://codingnest.com/generating-random-numbers-using-c-standard-library-the-problems/

## Random number facilities used in PCL

PCL uses a variety of random number facilities. We will start by reviewing those.

### `std::rand()`/`std::srand()`

The legacy C functions `std::rand()` and `std::srand()` are still used quite a lot in PCL: in [examples](https://github.com/PointCloudLibrary/pcl/blob/75a943f5bf0270e605f25757d0699f949b90a531/examples/surface/test_nurbs_fitting_surface.cpp#L17), in [tests](https://github.com/PointCloudLibrary/pcl/blob/b3af6c12487ce9f7723399c7c54ec69ff5ca2fd7/test/io/test_octree_compression.cpp), in [tutorials](https://github.com/PointCloudLibrary/pcl/blob/8eb47684da26d42cd8cd69c79976e8aee04a2831/doc/tutorials/content/sources/remove_outliers/remove_outliers.cpp), in [apps](https://github.com/PointCloudLibrary/pcl/blob/8eb47684da26d42cd8cd69c79976e8aee04a2831/apps/src/test_search.cpp), in [embedded 3rd-party code](https://github.com/PointCloudLibrary/pcl/blob/8eb47684da26d42cd8cd69c79976e8aee04a2831/surface/src/3rdparty/opennurbs/opennurbs_rand.cpp) but also in [general library code](https://github.com/PointCloudLibrary/pcl/blob/237bc24be19fd92f85aace32144ba5afb37fb68d/features/include/pcl/features/impl/board.hpp) itself.

There is plenty of literature on the internet (TODO: links) showing why those are not good enough, but here are a few issues to sum it up:
* `rand` has a global state which doesn't allow finer control over the different places that use it.
* `rand` is not thread-safe.
* `rand` has historically had a bunch of poor implementations and from what I understood the state of things is not that great.
* The common practices using `rand` [often have subtle numerical issues](http://www.azillionmonkeys.com/qed/random.html).

*See also issue #3781.*

Fortunately there has been Boost.Random for a long time and `<random>` in the standard library since C++11.

### `random_device`

`random_device` is the only class from `<random>` that can produce real random numbers, and is used in PCL here and there to initialize PRNGs. However, the `std::random_device` implementation in libstdc++ is notably known to produce a fixed sequence of numbers when targeting Windows via MinGW-w64, and its `entropy` function can't really be used to check whether the underlying implementation will actually return random numbers or not-random-at-all ones - there is even a [standard proposal](https://github.com/cplusplus/papers/issues/786) to remove `entropy`. This will be fixed in GCC10, but that means that the problem will still exist for years.

If the closest thing from an actual random number is needed, `boost::random_device` is still a better tool today.

### Pseudo-random number generators

There are plenty of PRNGs in both `<random>` and Boost.Random. They are pretty good despite their shortcomings (TODO: links), and are guaranteed to produce the same sequence of pseudo-random numbers on every implementation for a given seed. I don't know whether the Boost.Random ones have any advantages over the `<random>` ones, so either can be used.

Whether using a Mersenne Twister everywhere is debatable since it seems to be a rather heavy PRNG, and users might want to replace it depending on their needs (see the first answer in that thread). PCL facilities that use random number generators could very allow users to optionally provide their own PRNG when one is needed, just like some standard library algorithms (ex: [`std::shuffle`](https://en.cppreference.com/w/cpp/algorithm/random_shuffle)). Issue #3724 already mentions the will to let users choose their PRNG.

### Distributions

Unlike PRNGs, the standard does not specify a specific algorithm to be used for every distribution in `<random>` and only describes the properties of the distribution. In the real world that means that the distributions in the different standard libraries use different algorithms and that they won't return the same results when given the same PRNG seeded with the same value. As we will see in another section this might be a problem for reproducible tests on different platforms.

The simplest way to achieve reproducible results for distributions is to use the Boost.Random ones: the algorithm for a given distribution will be the same on all platforms. It is worth nothing that the standard committee recently [rejected a proposal](https://github.com/cplusplus/papers/issues/787) to make distribution results portable between platforms.

### Eigen random number functions

Apparently Eigen random functions use `rand()`, with all its problems. I have to check whether it is still the case and what can be done about it.

## Reproducible issues in code using random numbers

One thing that might be of value is the ability to reproduce errors in algorithms that use random-number generators even when those errors happen during the continuous integration or if someone opens an issue. The need for reproducible results in the presence of RNGs was already mentioned in #3715. This probably requires a bunch of changes, but it's not infeasible.

The first thing it requires is the ability to log the seed used to initialize the PRNG in the algorithm where the issue appeared. We can't really log from the algorithms themselves, so the most flexible solution is to do it like the standard library does: allow users to pass an already seeded PRNG to algorithms that need one, and use that to pass manually seeded PRNGs in the test suite.

The test suite can either use a single seed for the whole project - for example GTest's `UnitTest::random_seed()` -, or a seed per test case. What matters is that this seed should at the very least be logged when the corresponding test fails.

For truly reproducible failures in the test suite, the distributions from Boost.Random have to be used. If the standard ones are not used an error in libstdc++ might not be reproducible on the tester's computer due to the use of different algorithms for distributions.

## Thread safety

I don't know how much PCL cares about thread-safety: currently it uses `std::rand()` (not thread-safe) in some places, but [instantiates PRNGs in some functions](https://github.com/PointCloudLibrary/pcl/blob/master/filters/include/pcl/filters/impl/voxel_grid_covariance.hpp#L414) despite their cost, which is thread-safe. A good middle ground it to have per-function `thread_local` PRNGs to have both thread safety without paying the cost of initialization every time.

## Easy-to-use interface

The random number facilities in `<random>` are flexible enough but far from easy to use, maybe PCL wants to have its own facilities built on top to avoid the burden of going manually through the standard interface every time - this kind of burden is what pushes people to use `std::rand()` even today. Having a simplified interface and making it available would also allow to write tutorials that don't use `std::rand()`.

Currently the only "simple" function that were tentatively standardized are [`std::experimental::randint`](https://en.cppreference.com/w/cpp/experimental/randint) and [`std::experimental::reseed`](https://en.cppreference.com/w/cpp/experimental/reseed) in the Library Fundamentals TS v2, along with overloads of `std::sample` and `std::shuffle` that don't explicitly take a PRNG but require the presence of a "global" per-thread engine (which `reseed()` seeds). PCL could have something along these lines too: each function needing a PRNG would then either use a user-provided one or use the library's global per-thread engine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[discussion] Improve usage of Random numbers in PCL #3996

Random number facilities used in PCL

`std::rand()`/`std::srand()`

`random_device`

Pseudo-random number generators

Distributions

Eigen random number functions

Reproducible issues in code using random numbers

Thread safety

Easy-to-use interface

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[discussion] Improve usage of Random numbers in PCL #3996

Description

Random number facilities used in PCL

std::rand()/std::srand()

random_device

Pseudo-random number generators

Distributions

Eigen random number functions

Reproducible issues in code using random numbers

Thread safety

Easy-to-use interface

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`std::rand()`/`std::srand()`

`random_device`