Skip to content

Cannot learn from large numbers of interactions #719

@MariaZharova

Description

@MariaZharova

Hi there!
I'm trying to fit LightFM model on a quite large dataset: it contains ~13 million items and ~38.5 million users. But the main problem is that the number of interactions is more than 3 billion, this is > (2^32 - 1). And I have the following error while calling fit method:

File ~/.cache/pypoetry/virtualenvs/complementary-items-GrOWNq8P-py3.10/lib/python3.10/site-packages/lightfm/lightfm.py:684, in LightFM._run_epoch(self, item_features, user_features, interactions, sample_weight, num_threads, loss)
    677 """
    678 Run an individual epoch.
    679 """
    681 if loss in ("warp", "bpr", "warp-kos"):
    682     # The CSR conversion needs to happen before shuffle indices are created.
    683     # Calling .tocsr may result in a change in the data arrays of the COO matrix,
--> 684     positives_lookup = CSRMatrix(
    685         self._get_positives_lookup_matrix(interactions)
    686     )
    688 # Create shuffle indexes.
    689 shuffle_indices = np.arange(len(interactions.data), dtype=np.int32)

File lightfm/_lightfm_fast_openmp.pyx:167, in lightfm._lightfm_fast_openmp.CSRMatrix.__init__()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

I guess this problem caused by using int type for num interactions in CPython-file _lightfm_fast_openmp.c. Is there a plan to expand the data type to long?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions