Skip to content

How to bin integral vs floating point correctly (precision issue and general question) #336

@emmenlau

Description

@emmenlau

Dear @HDembinski and other authors, thanks for this awesome library. We have just started using it and the speed and versatility are impressive! We have a problem that manifests as a precision problem, but also shows a more general question.

The problem: We use the floating precision histogram to bin any kind of data. When we bin data (gray value camera images) of integer precision, we set lower and upper bound based on the data range to [min, max+1] and the number of bins to upper bound - lower bound. However there are sometimes cases where the numeric precision then leads to an empty last bin. This is due to rounding error, the values fall into max-1 instead of max. We checked and the bin index is then computed as (max-1).999... which finally resolves to max-1. Is there a good solution to this? It leads to confusion that the last bin is empty, and its also a bit problematic in writing tests.

The underlying question: We are not completely certain what is a "recommended" way to set histogram bounds for integral data space. Are there common guidelines? In floating point precision it is more understandable what data range is covered by each bin. But in integral space, it seems the value 0 could be best represented by bin [-0.5, 0.5) rather than the more obvious [0, 1). And help, links or pointers to documentation would be highly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions