LB-1936: Add option to import listening history Tidal streaming files #3643
LB-1936: Add option to import listening history Tidal streaming files #3643MrKomodoDragon wants to merge 3 commits intometabrainz:masterfrom
Conversation
|
Other notes aside, which rest assured, I will comment soon.
I would suggest trimming the fat down to maybe, at most 5 entries that cover the widest possible range of test cases, and writing tests accordingly. Writing tests that include 27 entries is likely unnecessary, given your tests cover adequate ground, if I make sense. |
There was a problem hiding this comment.
I would like to begin the review with, did you test this feature? I am afraid the answer is no, though I sincerely hope you have an explanation. Either way, my point is, please don't mark a PR as tested if you haven't tested it. Checking the "I have run the code and manually tested the changes" box means you have done what it says. The importer seems unable to find the Tidal header at all, running it through the streaming.csv file provided in the ticket.
Even that error, I only reached correcting a typo you have submitted for review. Reviews are for helping catch issues, yes, but surely the code should at least be tested to be functional, no?
Either way. While it's quite clear you didn't use AI, which is commendable, I would have expected a higher standard than this.
As for your comment on timezones, the provided data is already in UTC, which is what LB accepts for timestamps. No conversion should be needed, as far as I can tell. Whether the timezone information should be stored as a new field in the additional_info field is a whole 'nother conversation, and is probably your prerogative. :)
As a side note, while I can tell exactly which importer you tried to work off, I think it might have been the wrong one. I think the Audioscrobbler importer would have given you a better idea, as we know exactly which fields are meant to be used. So, using DictReader() with specifically those header names, saves us quite a lot of hassle in my opinion. No need to check for headers, or worry about it being broken if the file is corrupted. Does this mean you need to re-write the whole file? Not necessarily, it might still be fine, if you can make it work. Might want to wait up on @MonkeyDo's opinion for that.
P.S.: Rewritten to be less scathing than my first draft.
| elif service == "spinitron": | ||
| importer = SpinitronListensImporter(db_conn, ts_conn) | ||
| elif service == "tidal": | ||
| imporer = TidalListensImporter(db_conn, ts_conn) |
There was a problem hiding this comment.
This typo is presumably the reason why the import task keeps waiting forever. I am quite curious how you managed to test the feature, when the import never completes.
| current_app.logger.debug("Invalid Timestamp in item: %s", item, exc_info=True) | ||
| continue | ||
|
|
||
| if not (self._from_date <= datetime.fromtimestamp <= self._to_date): |
There was a problem hiding this comment.
This is a comparison to the datetime.fromtimestamp function; not a function call, the function object itself. This is presumably why you defined the ts variable, that is what would need to be compared.
| current_app.logger.debug("Invalid Timestamp in item: %s", item, exc_info=True) | ||
| continue | ||
|
|
||
| if not (self._from_date <= datetime.fromtimestamp <= self._to_date): |
There was a problem hiding this comment.
I would suggest moving this block to the process_import_file(...) function. This is because the number of successful listens imported is reported based on what is passed to this function. If a user wants to import over a one day period that has say 13 listens, and it shows 13 successful listens imported out of like 3k, I think that would concern them for no reason.
|
|
||
| additional_info: dict[str, Any] = { | ||
| "submission_client": self.importer_name, | ||
| "original_submission_client": "Tidal", |
There was a problem hiding this comment.
Perhaps music_service would be the better choice here, as Tidal is a music service provider, not a scrobbler. For reference, please refer to the additional_info table.
| }, | ||
| } | ||
|
|
||
| additional_info: dict[str, Any] = { |
There was a problem hiding this comment.
Please also add duration_ms to the additional_info field, as that seems to be provided in the export.
|
hey @shirsakm, terribly sorry, I had some local changes that I forgot to push as part of my PR. I marked it as something that was checked because I did in fact, have the changes locally. please test again and let me know if there are any issues |
shirsakm
left a comment
There was a problem hiding this comment.
The importer does seem to be working now!
Just added a couple more notes that come to mind, nothing too major. Please add the tests whenever you can, as I don't think the PR will be complete without tests.
There are also some formatting changes, and random changes to white spaces, etc. that I haven't pointed out individually, though I hope you will cleanup the files after incorporating the changes.
Other than that, good work so far!
|
|
||
| from flask import current_app | ||
| from more_itertools import chunked | ||
| from dateutil import parser as dateutil_parser |
There was a problem hiding this comment.
Nitpick: This import is not used.
| listens = [] | ||
| for item in batch: | ||
| try: | ||
| date_time = datetime.strptime(item["entry_date"], "%d/%m/%Y %H:%M") # TODO: Do we use the timezone column of the streaming csv file to append info about the timezone? |
There was a problem hiding this comment.
If my knowledge serves me correct, does this not take into consideration the local timezone? What I mean by that is if the server that processes this is not set to UTC, won't this cause bugs, by calculating the timestamp incorrectly?
|
|
||
| @staticmethod | ||
| def _looks_like_header(line: str) -> list[str] | None: | ||
| maybe_header = line.split(",") |
There was a problem hiding this comment.
Why did you remove the cleanup, and make it a simple .split() call again? I suppose it's not necessary, but it would still probably be fine to have the more robust system anyway.
| self._from_date = import_task["from_date"] | ||
| self._to_date = import_task["to_date"] | ||
|
|
||
| with open(import_task["file_path"], mode="r", encoding="utf-8-sig") as file: |
There was a problem hiding this comment.
What exactly does this change do? I did try my best to read up on this encoding format, but I am afraid I am still quite clueless what it does, and how exactly this changes things. And was it specified in the README, rather I should ask, what was your rationale behind this change?
I think a comment about the encoding format and why it is utf-8-sig should probably be added for posterity sake, as well.
There was a problem hiding this comment.
The CSV file has a byte order mark. Using "utf-8" means that the BOM is read as part of the contents of the file. As far as I understand it, "utf-8-sig" does not end up including the BOM as part of the contents of the file. Using utf-8 previously lead to the header errors you got because the first entry header would be something like "\ufeffartist_name", thus not matching the equality check.
Problem
This PR adds the option to import listening history from Tidal import files, as requested in LB-1936.
Solution
The PR parses the
streaming.csvfile that a user can request from Tidal by emailing support, generates listens, and adds them to the database.It also updates the UI and flask API route to handle the new import service.
AI usage
Action
On merge, please run
admin/sql/updates/2026-03-13-add-tidal-data-import-type.sqlto add the streaming service type to the database.I know tests are necessary, but I was unfortunately stumped on whether it would be a fruitful task to write lines for all 27(!) entries that are added in the sample CSV file. Please let me know what I should do regarding that.
Thanks!