-
Notifications
You must be signed in to change notification settings - Fork 725
Open
Description
Hi there!
I'm trying to fit LightFM model on a quite large dataset: it contains ~13 million items and ~38.5 million users. But the main problem is that the number of interactions is more than 3 billion, this is > (2^32 - 1). And I have the following error while calling fit method:
File ~/.cache/pypoetry/virtualenvs/complementary-items-GrOWNq8P-py3.10/lib/python3.10/site-packages/lightfm/lightfm.py:684, in LightFM._run_epoch(self, item_features, user_features, interactions, sample_weight, num_threads, loss)
677 """
678 Run an individual epoch.
679 """
681 if loss in ("warp", "bpr", "warp-kos"):
682 # The CSR conversion needs to happen before shuffle indices are created.
683 # Calling .tocsr may result in a change in the data arrays of the COO matrix,
--> 684 positives_lookup = CSRMatrix(
685 self._get_positives_lookup_matrix(interactions)
686 )
688 # Create shuffle indexes.
689 shuffle_indices = np.arange(len(interactions.data), dtype=np.int32)
File lightfm/_lightfm_fast_openmp.pyx:167, in lightfm._lightfm_fast_openmp.CSRMatrix.__init__()
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'
I guess this problem caused by using int type for num interactions in CPython-file _lightfm_fast_openmp.c. Is there a plan to expand the data type to long?
Metadata
Metadata
Assignees
Labels
No labels