Parallelisation for speedier CPU inference? #12

jpcompartir · 2022-06-30T21:47:01Z

jpcompartir
Jun 30, 2022
Collaborator

Should we add a dependency to furrr/future and provide functions for cutting datasets + processing in parallel, or should this be left to the user?

samterfa · 2022-06-30T21:55:52Z

samterfa
Jun 30, 2022
Collaborator

I would love to figure out paralleliz(s)ation. We could have a use_parallel parameter, and if it's true, check that furrr or future is installed. We wouldn't need a dependency that way. I really like the furrr package; it's super simple to use.

1 reply

jpcompartir Jun 30, 2022
Collaborator Author

Nice, not sure if this is an anti-pattern if using reticulate, but it could be as simple as adding a cuts variable to data where cuts = availableCores() then group_split(cuts) then future_map_dfr()? calling the pipeline inside the future_map_dfr(). This would be taking care of the parallelisation totally in R - which may not be the best choice. I'll get a bit more acquainted with the functions so far in the package and test it out in the next couple of days.

samterfa · 2022-06-30T22:20:24Z

samterfa
Jun 30, 2022
Collaborator

Actually R parallelism doesn't work with Reticulate objects so it's not a slick as it usually is in R. I've tried this for speed ups and you have to load the model in each process. https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html

1 reply

jpcompartir Jun 30, 2022
Collaborator Author

I see, rather than creating multiple sessions with multisession, is it possible with multicore? Will dip into the Natural Language Processing with Transformers book, for the section on accelerated inference with a CPU- see if it's better to do it directly in the model/pipe instantiation somehow...

This wasn't figured out in spacyr either: quanteda/spacyr#93

jpcompartir · 2022-07-10T12:59:43Z

jpcompartir
Jul 10, 2022
Collaborator Author

Latest update on this is that we’re using torch to send models and pipelines to the GPU. Currently testing/benchmarking this to get a better gauge of speed up. Currently not quite as fast as expected - but thinking that when it comes to training (future versions of this package will look to utilise the trainer API and the datasets library) the speed up would be higher - calculating gradients etc.

Do also need to test with different batch sizes, pushing the GPU to its limit etc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelisation for speedier CPU inference? #12

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Parallelisation for speedier CPU inference? #12

Uh oh!

jpcompartir Jun 30, 2022 Collaborator

Replies: 3 comments · 2 replies

Uh oh!

samterfa Jun 30, 2022 Collaborator

Uh oh!

jpcompartir Jun 30, 2022 Collaborator Author

Uh oh!

samterfa Jun 30, 2022 Collaborator

Uh oh!

Uh oh!

jpcompartir Jun 30, 2022 Collaborator Author

Uh oh!

Uh oh!

jpcompartir Jul 10, 2022 Collaborator Author

jpcompartir
Jun 30, 2022
Collaborator

Replies: 3 comments 2 replies

samterfa
Jun 30, 2022
Collaborator

jpcompartir Jun 30, 2022
Collaborator Author

samterfa
Jun 30, 2022
Collaborator

jpcompartir Jun 30, 2022
Collaborator Author

jpcompartir
Jul 10, 2022
Collaborator Author