-
Notifications
You must be signed in to change notification settings - Fork 25
Multi weight storage #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Multi weight storage #1008
Conversation
Some numbers comparing a 8*8 2D MultiWeight histogram with 20k weights vs 20k normal (single weights) histograms: |
For a 8*8 2D histogram a = np.zeros((8,8, 20000))
for i in range(8):
for j in range(8):
a[i,j] = np.array(h[i,j]) |
include/bh_python/multi_weight.hpp
Outdated
namespace histogram { | ||
|
||
template <class T> | ||
struct multi_weight_value : public boost::span<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is misleading. This inherits from boost::span, which is a view type, but the name X_value
suggest it is a value type. A value type holds its data, while a view does not. So this is a multi_weight_span or multi_weight_view. Probably the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need a multi_weight_value, a multi_weight_reference, and multi_weight_const_reference. You can make the multi_weight_reference derive from multi_weight_const_reference to avoid some duplication. The multi_weight_value should hold a copy of the data, while the references are views like your current 'multi_weight_value'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should try to make a multi_weight_base with common code for all of these, like the operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, if I understand your suggestion right, my idea would be to have
multi_weight_value
derived from astd::vector
that adds assignment and sum operatorsmulti_weight_const_reference
derived from aboost::span
that does not define any operators to modify the datamulti_weight_reference
derived frommulti_weight_const_reference
that adds the assignment and sum operators
Should I better use a std::valarray
instead of std::vector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, should it not be enough to define
using const_reference = const reference
(without the &
)
because this would automatically prohibit the use of any function that is not defined const
?
Therefore, it should be sufficient to rename multi_weight_value -> multi_weight_reference
and implement a separate multi_weight_value
class maybe based on std::vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, should it not be enough to define
using const_reference = const reference
(without the&
)
because this would automatically prohibit the use of any function that is not definedconst
?
Maybe. Usually, I had a reason why I did things in a certain way, but you might be right here. Implementing const_reference without the mutable methods and then inherit a reference from const_reference is cleaner, because then the methods that shouldn't be called aren't there. That being said, I can see the appeal of implementing only one kind of reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I read my own comment vom 2022 again
boostorg/histogram#211 (comment)
and now I realize again why the multi_weight_value inherited from boost::span. That was not an oversight, this choice is explained there. Now I have to rethink this whole approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we have to implement a copy-on-write mechanism here to meet the requirements, I don't know.
I think that's why this got stuck, the design wasn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about it, I think the approach we are developing is still fine. The storage holds all the weights for all cells in one large contiguous memory block. When you create a multi_weight_value, it has to return a copy, and that's fine, because the interfaces are generally designed to avoid copies, returning const_reference and reference where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, should it not be enough to define
using const_reference = const reference
(without the&
)
because this would automatically prohibit the use of any function that is not definedconst
?Maybe. Usually, I had a reason why I did things in a certain way, but you might be right here. Implementing const_reference without the mutable methods and then inherit a reference from const_reference is cleaner, because then the methods that shouldn't be called aren't there. That being said, I can see the appeal of implementing only one kind of reference.
Something was bothering me about this and so I checked. And I was right, you cannot implement a const reference like that, look:
#include <boost/core/span.hpp>
#include <vector>
#include <cassert>
using mutable_span = boost::span<double>;
using const_span = const boost::span<double>; // not really const
const_span foo(const_span x) {
x[1] = 0; // oopsie
return x;
}
int main()
{
std::vector<double> x = {1, 2, 3};
mutable_span y = foo(x);
y[0] = 0;
assert(x[0] == 0); // oopsie
assert(x[1] == 0); // oopsie
}
The const
modifier on a value type (no &) doesn't prevent you from calling methods which mutate its contents. But even if that were true, there is no way in C++ to prevent initializing a mutable_span from a const_span, because you can also do:
const double x = 3;
double y = x;
C++ implicitly assumes that all types which are not references (no &) are value types, meaning assignment creates a copy.
include/bh_python/multi_weight.hpp
Outdated
return std::equal(this->begin(), this->end(), values.begin()); | ||
} | ||
bool operator!=(const boost::span<T> values) const { return !operator==(values); } | ||
void operator+=(const std::vector<T> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not accepting a span as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to address this in b2cffcf.
However, I can not template +=
to accept a class S
because this results in a huge compiler error message which says
/cvmfs/sft.cern.ch/lcg/views/LCG_107a/x86_64-el9-gcc14-opt/include/boost/histogram/detail/accumulator_traits.hpp:81:37: error: call of overloaded 'accumulator_traits_impl(boost::histogram::multi_weight_value<double>&, boost::histogram::detail::priority<2>)' is ambiguous
81 | decltype(accumulator_traits_impl(std::declval<T&>(), priority<2>{}));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, I have now kept the overload of the operator+=
but the vector
version now also calls the span
version to not duplicate code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this might be solved by switching to the split into multi_weight_value
and multi_weight_reference
as proposed in #1008 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should get it to work with boost::span
, because then it works with any contiguous memory container without a copy. Also it should be const std::vector<T>& values
instead of const std::vector<T> values
, the latter is passing by value so a copy is created even when the vector type matches exactly.
Wow, that's quite an impressive patch. I really appreciate the benchmarks, which nicely confirm the expected benefits. I hope you have more time to implement the changes. I suggest we work on the implementation here and backport it to boostorg/histogram later. In the end, both libraries should be in sync. |
Hi @HDembinski . |
Within a PR you don't need to do clean commits, I won't look at the commit history. Feel free to fix multiple issues in one commit. |
By the way, once this feature is done, I suggest you present this at the next PyHEP https://indico.cern.ch/event/1515852/ or the one next year, depending on how quickly we get this done. It is a major feature, and you deserve recognition for implementing this. I left science, so I don't go to any of these workshops anymore. |
Hi @HDembinski, |
…togram into multi_weight_storage
I didn't notice before, but you use sample to pass the weights, that's breaking interface assumptions. Weights should be passed with the weight keyword. I see that I did that, too, in my demo. I guess it was the easiest hack to make it work, but that's breaking all kinds of interface contracts. |
I completely agree. |
However, it is kinda nice that it basically works with |
@Superharz I considering moving development of this feature back to boostorg/histogram. I find it easier to develop C++ code there. I will merge your changes here to that branch. |
This PR adds a multi weight storage type (called
MultiWeight
) to boost-histogram. It addresses #83 and is bases on a prototype provided by @HDembinski in boostorg/histogram#211.It allows one to create a histogram that can store multiple independent weights per bin (the number of weights per bin has to be specified when creating the histogram).
Example
Create and fill a 1 dim histogram that stores 3 weights per bin:
Status of this PR
The MultiWeight histograms can be created in python and they can be filled.
Pickle also works.
The buffer and view for the histograms has yet to be implemented.