Skip to content

LB-1936: Add option to import listening history Tidal streaming files #3643

Open
MrKomodoDragon wants to merge 3 commits intometabrainz:masterfrom
MrKomodoDragon:tidal-import
Open

LB-1936: Add option to import listening history Tidal streaming files #3643
MrKomodoDragon wants to merge 3 commits intometabrainz:masterfrom
MrKomodoDragon:tidal-import

Conversation

@MrKomodoDragon
Copy link
Copy Markdown
Contributor

Problem

This PR adds the option to import listening history from Tidal import files, as requested in LB-1936.

Solution

The PR parses the streaming.csv file that a user can request from Tidal by emailing support, generates listens, and adds them to the database.

It also updates the UI and flask API route to handle the new import service.

  • I have run the code and manually tested the changes

AI usage

  • I did not use any AI

Action

On merge, please run admin/sql/updates/2026-03-13-add-tidal-data-import-type.sql to add the streaming service type to the database.

I know tests are necessary, but I was unfortunately stumped on whether it would be a fruitful task to write lines for all 27(!) entries that are added in the sample CSV file. Please let me know what I should do regarding that.

Thanks!

  1. 🔰 If any action is needed on merge or deployment, list it such as:
    1. Add tests
    2. Run an SQL update script

@shirsakm
Copy link
Copy Markdown
Contributor

Other notes aside, which rest assured, I will comment soon.

I know tests are necessary, but I was unfortunately stumped on whether it would be a fruitful task to write lines for all 27(!) entries that are added in the sample CSV file.

I would suggest trimming the fat down to maybe, at most 5 entries that cover the widest possible range of test cases, and writing tests accordingly. Writing tests that include 27 entries is likely unnecessary, given your tests cover adequate ground, if I make sense.

Copy link
Copy Markdown
Contributor

@shirsakm shirsakm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to begin the review with, did you test this feature? I am afraid the answer is no, though I sincerely hope you have an explanation. Either way, my point is, please don't mark a PR as tested if you haven't tested it. Checking the "I have run the code and manually tested the changes" box means you have done what it says. The importer seems unable to find the Tidal header at all, running it through the streaming.csv file provided in the ticket.

Image

Even that error, I only reached correcting a typo you have submitted for review. Reviews are for helping catch issues, yes, but surely the code should at least be tested to be functional, no?

Either way. While it's quite clear you didn't use AI, which is commendable, I would have expected a higher standard than this.

As for your comment on timezones, the provided data is already in UTC, which is what LB accepts for timestamps. No conversion should be needed, as far as I can tell. Whether the timezone information should be stored as a new field in the additional_info field is a whole 'nother conversation, and is probably your prerogative. :)

As a side note, while I can tell exactly which importer you tried to work off, I think it might have been the wrong one. I think the Audioscrobbler importer would have given you a better idea, as we know exactly which fields are meant to be used. So, using DictReader() with specifically those header names, saves us quite a lot of hassle in my opinion. No need to check for headers, or worry about it being broken if the file is corrupted. Does this mean you need to re-write the whole file? Not necessarily, it might still be fine, if you can make it work. Might want to wait up on @MonkeyDo's opinion for that.

P.S.: Rewritten to be less scathing than my first draft.

elif service == "spinitron":
importer = SpinitronListensImporter(db_conn, ts_conn)
elif service == "tidal":
imporer = TidalListensImporter(db_conn, ts_conn)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typo is presumably the reason why the import task keeps waiting forever. I am quite curious how you managed to test the feature, when the import never completes.

current_app.logger.debug("Invalid Timestamp in item: %s", item, exc_info=True)
continue

if not (self._from_date <= datetime.fromtimestamp <= self._to_date):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comparison to the datetime.fromtimestamp function; not a function call, the function object itself. This is presumably why you defined the ts variable, that is what would need to be compared.

current_app.logger.debug("Invalid Timestamp in item: %s", item, exc_info=True)
continue

if not (self._from_date <= datetime.fromtimestamp <= self._to_date):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this block to the process_import_file(...) function. This is because the number of successful listens imported is reported based on what is passed to this function. If a user wants to import over a one day period that has say 13 listens, and it shows 13 successful listens imported out of like 3k, I think that would concern them for no reason.


additional_info: dict[str, Any] = {
"submission_client": self.importer_name,
"original_submission_client": "Tidal",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps music_service would be the better choice here, as Tidal is a music service provider, not a scrobbler. For reference, please refer to the additional_info table.

},
}

additional_info: dict[str, Any] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add duration_ms to the additional_info field, as that seems to be provided in the export.

@MrKomodoDragon
Copy link
Copy Markdown
Contributor Author

MrKomodoDragon commented Mar 16, 2026

hey @shirsakm, terribly sorry, I had some local changes that I forgot to push as part of my PR. I marked it as something that was checked because I did in fact, have the changes locally.

please test again and let me know if there are any issues

@MrKomodoDragon MrKomodoDragon requested a review from shirsakm March 16, 2026 21:07
Copy link
Copy Markdown
Contributor

@shirsakm shirsakm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The importer does seem to be working now!

Just added a couple more notes that come to mind, nothing too major. Please add the tests whenever you can, as I don't think the PR will be complete without tests.

There are also some formatting changes, and random changes to white spaces, etc. that I haven't pointed out individually, though I hope you will cleanup the files after incorporating the changes.

Other than that, good work so far!


from flask import current_app
from more_itertools import chunked
from dateutil import parser as dateutil_parser
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This import is not used.

listens = []
for item in batch:
try:
date_time = datetime.strptime(item["entry_date"], "%d/%m/%Y %H:%M") # TODO: Do we use the timezone column of the streaming csv file to append info about the timezone?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my knowledge serves me correct, does this not take into consideration the local timezone? What I mean by that is if the server that processes this is not set to UTC, won't this cause bugs, by calculating the timestamp incorrectly?


@staticmethod
def _looks_like_header(line: str) -> list[str] | None:
maybe_header = line.split(",")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the cleanup, and make it a simple .split() call again? I suppose it's not necessary, but it would still probably be fine to have the more robust system anyway.

self._from_date = import_task["from_date"]
self._to_date = import_task["to_date"]

with open(import_task["file_path"], mode="r", encoding="utf-8-sig") as file:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does this change do? I did try my best to read up on this encoding format, but I am afraid I am still quite clueless what it does, and how exactly this changes things. And was it specified in the README, rather I should ask, what was your rationale behind this change?

I think a comment about the encoding format and why it is utf-8-sig should probably be added for posterity sake, as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV file has a byte order mark. Using "utf-8" means that the BOM is read as part of the contents of the file. As far as I understand it, "utf-8-sig" does not end up including the BOM as part of the contents of the file. Using utf-8 previously lead to the header errors you got because the first entry header would be something like "\ufeffartist_name", thus not matching the equality check.

@anshg1214 anshg1214 self-requested a review April 4, 2026 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants