Skip to content

Parallel computing: comparison and update #92

@option72

Description

@option72

I would like to develop a random matrix generator and I found very interesting example at https://cran.r-project.org/web/packages/dqrng/vignettes/parallel.html
Since I am rather new at parallel computation, I am unable to choose between the various approaches and to modify the examples.
I would like to generate a random matrix from an uniform distribution changing the following code:

-----------------------------1-PCG: multiple streams with RcppParallel---------------------------------------------------

#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_generator.h>
#include <dqrng_sample.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>

struct RandomFill : public RcppParallel::Worker {
  RcppParallel::RMatrix<int> output;
  uint64_t seed;

  RandomFill(Rcpp::IntegerMatrix output, const uint64_t seed) : output(output), seed(seed) {};

  void operator()(std::size_t begin, std::size_t end) {
    auto rng = dqrng::generator<pcg64>(seed, end);
    for (std::size_t col = begin; col < end; ++col) {
      auto sampled = dqrng::sample::sample<std::vector<int>, uint32_t>(*rng, 100000, output.nrow(), true);
      RcppParallel::RMatrix<int>::Column column = output.column(col);
      std::copy(sampled.begin(), sampled.end(), column.begin());
    }
  }
};

// [[Rcpp::export]]
Rcpp::IntegerMatrix parallel_random_matrix(const int n, const int m, const int ncores) {
  Rcpp::IntegerMatrix res(n, m);
  RandomFill randomFill(res, 42);
  RcppParallel::parallelFor(0, m, randomFill, m/ncores + 1);
  return res;
}

and I also would like to understand the difference between the former (1) and the latter (2) approach

-----------------------------2-Xo(ro)shiro: jump ahead with OpenMP-------------------------------------

#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>

// [[Rcpp::export]]
std::vector<double> random_sum(int n, int m) {
  dqrng::uniform_distribution dist(0.0, 1.0);               // Uniform distribution [0,1)
  auto rng = dqrng::generator<dqrng::xoshiro256plusplus>(); // seeded from R's RNG
  std::vector<double> res(m);
  for (int i = 0; i < m; ++i) {
    double lres(0);
    for (int j = 0; j < n; ++j) {
      lres += dist(*rng);
    }
    res[i] = lres / n;
  }
  return res;
}

// [[Rcpp::export]]
std::vector<double> parallel_random_sum(int n, int m, int ncores) {
  dqrng::uniform_distribution dist(0.0, 1.0);               // Uniform distribution [0,1)
  auto rng = dqrng::generator<dqrng::xoshiro256plusplus>(); // seeded from R's RNG
  std::vector<double> res(m);
  // ok to use rng here
  
  #pragma omp parallel num_threads(ncores)
  {
    // make thread local copy of rng and advance it by 1 ... ncores jumps
    auto lrng = rng->clone(omp_get_thread_num() + 1);

    #pragma omp for
    for (int i = 0; i < m; ++i) {
      double lres(0);
      for (int j = 0; j < n; ++j) {
        lres += dist(*lrng);
      }
      res[i] = lres / n;
    }
  }
  // ok to use rng here
  return res;
}

in order to understand which is the more efficient and faster of the two solutions.
Thank you very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions