Skip to content

[discussion] Improve usage of Random numbers in PCL #3996

@Morwenn

Description

@Morwenn

This is a meta-issue concerning the uses of random number generation facilities, as discussed in the Discord. Since several Boost.Random components are being replaced with their standard library counterparts (ex: #2803, #3994), I thought that it was a good time to open a general library-wide discussion about random handling in PCL.

General information: https://codingnest.com/generating-random-numbers-using-c-standard-library-the-problems/

Random number facilities used in PCL

PCL uses a variety of random number facilities. We will start by reviewing those.

std::rand()/std::srand()

The legacy C functions std::rand() and std::srand() are still used quite a lot in PCL: in examples, in tests, in tutorials, in apps, in embedded 3rd-party code but also in general library code itself.

There is plenty of literature on the internet (TODO: links) showing why those are not good enough, but here are a few issues to sum it up:

  • rand has a global state which doesn't allow finer control over the different places that use it.
  • rand is not thread-safe.
  • rand has historically had a bunch of poor implementations and from what I understood the state of things is not that great.
  • The common practices using rand often have subtle numerical issues.

See also issue #3781.

Fortunately there has been Boost.Random for a long time and <random> in the standard library since C++11.

random_device

random_device is the only class from <random> that can produce real random numbers, and is used in PCL here and there to initialize PRNGs. However, the std::random_device implementation in libstdc++ is notably known to produce a fixed sequence of numbers when targeting Windows via MinGW-w64, and its entropy function can't really be used to check whether the underlying implementation will actually return random numbers or not-random-at-all ones - there is even a standard proposal to remove entropy. This will be fixed in GCC10, but that means that the problem will still exist for years.

If the closest thing from an actual random number is needed, boost::random_device is still a better tool today.

Pseudo-random number generators

There are plenty of PRNGs in both <random> and Boost.Random. They are pretty good despite their shortcomings (TODO: links), and are guaranteed to produce the same sequence of pseudo-random numbers on every implementation for a given seed. I don't know whether the Boost.Random ones have any advantages over the <random> ones, so either can be used.

Whether using a Mersenne Twister everywhere is debatable since it seems to be a rather heavy PRNG, and users might want to replace it depending on their needs (see the first answer in that thread). PCL facilities that use random number generators could very allow users to optionally provide their own PRNG when one is needed, just like some standard library algorithms (ex: std::shuffle). Issue #3724 already mentions the will to let users choose their PRNG.

Distributions

Unlike PRNGs, the standard does not specify a specific algorithm to be used for every distribution in <random> and only describes the properties of the distribution. In the real world that means that the distributions in the different standard libraries use different algorithms and that they won't return the same results when given the same PRNG seeded with the same value. As we will see in another section this might be a problem for reproducible tests on different platforms.

The simplest way to achieve reproducible results for distributions is to use the Boost.Random ones: the algorithm for a given distribution will be the same on all platforms. It is worth nothing that the standard committee recently rejected a proposal to make distribution results portable between platforms.

Eigen random number functions

Apparently Eigen random functions use rand(), with all its problems. I have to check whether it is still the case and what can be done about it.

Reproducible issues in code using random numbers

One thing that might be of value is the ability to reproduce errors in algorithms that use random-number generators even when those errors happen during the continuous integration or if someone opens an issue. The need for reproducible results in the presence of RNGs was already mentioned in #3715. This probably requires a bunch of changes, but it's not infeasible.

The first thing it requires is the ability to log the seed used to initialize the PRNG in the algorithm where the issue appeared. We can't really log from the algorithms themselves, so the most flexible solution is to do it like the standard library does: allow users to pass an already seeded PRNG to algorithms that need one, and use that to pass manually seeded PRNGs in the test suite.

The test suite can either use a single seed for the whole project - for example GTest's UnitTest::random_seed() -, or a seed per test case. What matters is that this seed should at the very least be logged when the corresponding test fails.

For truly reproducible failures in the test suite, the distributions from Boost.Random have to be used. If the standard ones are not used an error in libstdc++ might not be reproducible on the tester's computer due to the use of different algorithms for distributions.

Thread safety

I don't know how much PCL cares about thread-safety: currently it uses std::rand() (not thread-safe) in some places, but instantiates PRNGs in some functions despite their cost, which is thread-safe. A good middle ground it to have per-function thread_local PRNGs to have both thread safety without paying the cost of initialization every time.

Easy-to-use interface

The random number facilities in <random> are flexible enough but far from easy to use, maybe PCL wants to have its own facilities built on top to avoid the burden of going manually through the standard interface every time - this kind of burden is what pushes people to use std::rand() even today. Having a simplified interface and making it available would also allow to write tutorials that don't use std::rand().

Currently the only "simple" function that were tentatively standardized are std::experimental::randint and std::experimental::reseed in the Library Fundamentals TS v2, along with overloads of std::sample and std::shuffle that don't explicitly take a PRNG but require the presence of a "global" per-thread engine (which reseed() seeds). PCL could have something along these lines too: each function needing a PRNG would then either use a user-provided one or use the library's global per-thread engine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions