Hello,
Owl.Dense.Ndarray.S.copy_ is ~3-4x slower than Owl_base_dense_ndarray_generic.copy_ because it doesn't call Genarray.blit. Is this intentional/necessary? Can Owl_dense_ndarray_generic.copy_ be updated to use Owl_base_dense_ndarray_generic.copy_? Below are some benchmarks:
open Core
open Core_bench
let sizes = [ 10; 100; 1000; 10000; 100000; 1000000; 10000000 ]
let make_bench size =
let name = Printf.sprintf "Copy 1D array of size %d" size in
let src = Owl.Dense.Ndarray.S.empty [| size |] in
let dst = Owl.Dense.Ndarray.S.empty [| size |] in
Bench.Test.create ~name (fun () -> Owl.Dense.Ndarray.S.copy_ src ~out:dst)
;;
let () =
let tests = List.map sizes ~f:make_bench in
Command_unix.run (Bench.make_command tests)
;;
Owl_dense_ndarray_generic.copy_
┌────────────────────────────────┬────────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├────────────────────────────────┼────────────────┼─────────┼────────────┤
│ Copy 1D array of size 10 │ 103.61ns │ 4.00w │ │
│ Copy 1D array of size 100 │ 114.55ns │ 4.00w │ │
│ Copy 1D array of size 1000 │ 291.18ns │ 4.00w │ 0.01% │
│ Copy 1D array of size 10000 │ 2_006.74ns │ 4.00w │ 0.09% │
│ Copy 1D array of size 100000 │ 18_705.19ns │ 4.00w │ 0.80% │
│ Copy 1D array of size 1000000 │ 185_585.75ns │ 4.00w │ 7.97% │
│ Copy 1D array of size 10000000 │ 2_327_796.01ns │ 4.00w │ 100.00% │
└────────────────────────────────┴────────────────┴─────────┴────────────┘
Owl_base_dense_ndarray_generic.copy_
┌────────────────────────────────┬──────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├────────────────────────────────┼──────────────┼─────────┼────────────┤
│ Copy 1D array of size 10 │ 94.01ns │ 36.00w │ 0.02% │
│ Copy 1D array of size 100 │ 97.32ns │ 36.00w │ 0.02% │
│ Copy 1D array of size 1000 │ 115.62ns │ 36.00w │ 0.02% │
│ Copy 1D array of size 10000 │ 639.84ns │ 36.00w │ 0.11% │
│ Copy 1D array of size 100000 │ 5_578.15ns │ 36.00w │ 0.95% │
│ Copy 1D array of size 1000000 │ 61_613.86ns │ 36.00w │ 10.49% │
│ Copy 1D array of size 10000000 │ 587_095.10ns │ 36.00w │ 100.00% │
└────────────────────────────────┴──────────────┴─────────┴────────────┘

Hello,
Owl.Dense.Ndarray.S.copy_is ~3-4x slower thanOwl_base_dense_ndarray_generic.copy_because it doesn't callGenarray.blit. Is this intentional/necessary? CanOwl_dense_ndarray_generic.copy_be updated to useOwl_base_dense_ndarray_generic.copy_? Below are some benchmarks: