Skip to content

Conversation

@benoit74
Copy link
Contributor

@benoit74 benoit74 commented Nov 3, 2025

Fix #232

To be merged after #234 (which is simpler / narrower scope)

Changes:

  • new illustration module with the IllustrationInfo class, used by reader and writer modules
  • new signatures of reader.Archive.get_illustration_item: get "default" and get specific sizes/scales
  • new method reader.Archive.get_illustrations_infos
  • new signature of writer.Creator.add_illustration : pass detailed illustration infos instead of one fixed size

@benoit74 benoit74 self-assigned this Nov 3, 2025
@benoit74 benoit74 force-pushed the add_illustrations_api branch 5 times, most recently from 53b6486 to 4da244a Compare November 4, 2025 07:26
@benoit74 benoit74 mentioned this pull request Nov 4, 2025
@benoit74 benoit74 force-pushed the add_illustrations_api branch from 4da244a to 070b003 Compare November 6, 2025 14:09
@benoit74 benoit74 changed the base branch from main to libzim_9.4.0 November 6, 2025 14:09
@benoit74 benoit74 force-pushed the add_illustrations_api branch 3 times, most recently from 35ad235 to bfd945f Compare November 6, 2025 14:28
@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 98.52941% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 94.23%. Comparing base (0f0820a) to head (7db4684).
⚠️ Report is 1 commits behind head on libzim_9.4.0.

Files with missing lines Patch % Lines
libzim/libzim.pyx 98.52% 1 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           libzim_9.4.0     #233      +/-   ##
================================================
+ Coverage         93.77%   94.23%   +0.46%     
================================================
  Files                 1        1              
  Lines               546      607      +61     
================================================
+ Hits                512      572      +60     
- Misses               34       35       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@benoit74 benoit74 force-pushed the add_illustrations_api branch from bfd945f to de46d68 Compare November 6, 2025 14:34
@benoit74 benoit74 requested a review from rgaudin November 6, 2025 14:39
@benoit74 benoit74 marked this pull request as ready for review November 6, 2025 14:39
@benoit74
Copy link
Contributor Author

benoit74 commented Nov 6, 2025

Ready for review, wheels build still failing on ubuntu arm64 for "known reasons" but rest is in place so ready to be analyzed

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @benoit74,

When we designed this on libzim, it was very clear that:

  • attributes are an optional, for-the-future feature that we have no real use case for. It's still not in the spec.
  • independant width and height are important now
  • scale is important now

We thus requested to be able to directly add illustration directly using width, height, scale. That's not the case with this.
If there are libzim primitives to do it, bind them, if it's the same, add the necessary python glue.

On the reader side, it's less important but I feel like a lot is left on the hands of the libzim-user.
How do I know if 48x48 has various scales? I have to loop through all the getIllustrationInfos() results…

"""Get the illustration Metadata item of the archive.

Args:
size: Optional size of the illustration (for backward compatibility).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If optional, the default should be mentioned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is optional in the sense that you provide either directly the size or the full info. It is anyway not consistent with add_illustration, and I prefer when there is one single arg to match libzim, so let's move to this.

assert zim.has_illustration(48) is True
assert zim.has_illustration(96) is True
assert bytes(zim.get_illustration_item(size=48).content) == favicon_data
assert bytes(zim.get_illustration_item(size=96).content) == favicon_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be wise to have different fixtures. Would save some weird bug potential

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created fixture (with new Pillow dependency to generate them on-the-fly rather than painfully defining all of them in the code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: removed Pillow dependency, it is a pain to install on all our supported plaftorms

libzim/zim.pxd Outdated
void finishZimCreation() except + nogil
void setMainPath(string mainPath)
void addIllustration(unsigned int size, string content) except + nogil
void addIllustration(const IllustrationInfo& ii, string content) nogil except +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't nogil supposed to be last?

Copy link
Contributor Author

@benoit74 benoit74 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, forgot about it in some rebase probably ^^

@benoit74
Copy link
Contributor Author

benoit74 commented Nov 7, 2025

Thank you!

Regarding addIllustrations, there is "only" four variants available in the libzim: https://github.com/openzim/libzim/blob/8ef511ba0de6a8180d79fbc3c7edcfec4202ae5e/include/zim/writer/creator.h#L165-L197 ; I did not exposed the "contentProvider" variants because the one previously present was not exposed either. I could add the two missing, but it will change nothing to your point.

I generally second your remarks and frustrations about how "complex" stuff still is with current PR, but I considered we've agreed python-libzim should be a very thin wrapper on top of the libzim, and everything more "complex" should go into the python-scraperlib. I feel like your python-glue is something which should be implemented in the python-scraperlib.

For instance even with only this PR, it is already possible to do things like below which are not "that" awful. IllustrationInfo is just a structure to hold all details about the illustration:

creator.add_illustration(IllustrationInfo(48, 48), png_data)
creator.add_illustration(IllustrationInfo(48, 48, 2), png_data)

Details point are all valid.

@rgaudin
Copy link
Member

rgaudin commented Nov 7, 2025

I feel like your python-glue is something which should be implemented in the python-scraperlib.

I don't think so but this is debatable of course.

scraperlib is not intended for many users. It is, as its name implies, scraper-oriented and quite large and have many dependencies. We can do as we please there because we are the main users and requiring explicit extra types for this is OK there. We're already enforcing a lot of stuff, because we're the ones using it.

pylibzim on the other end is to be used by hopefully many users. It has zero runtime dependency, is lightweight and lean.
I am a strong advocate of the pure-binding strategy but in this case, the libzim API is technically-oriented and not user oriented. I'd prefer the same I asked for months ago which is exposing what we are using: width, height, scale. If it were not for those extraAttributes that were not part of the original request, we would not have this intermediate type.

It just seems silly to push this uncomfortable change to our users but let's keep this PR like this and open a separate ticket to discuss whether we want to make an exception or not.

@benoit74 benoit74 requested a review from rgaudin November 10, 2025 07:42
@benoit74 benoit74 force-pushed the add_illustrations_api branch from 839a33e to 7db4684 Compare November 10, 2025 07:57
@benoit74
Copy link
Contributor Author

Simple changes pushed, but I'm not comfortable with merging this if you are not comfortable with this PR "design" ; changing this afterwards might means breaking the API so needs a major, uncomfortable changes for "users", etc ...

I see no urgency in pushing these changes to the python-libzim, since we have no use-case so far, and would prefer to have a decision comfortable for everyone. And I somehow share your concerns, especially on the reader side. I'm less embarrassed by the fact that we need the new IllustrationInfo() type than you are.

Since python-libzim is anyway not used by our readers (so far, unfortunately?), maybe a simple middle ground would be:

  • add additional glue for add_illustration(width, height, scale, data), add_illustration(size, scale, data) (squared), get_illustration_item(width, height, scale) and get_illustration_item(size, scale) (squared), since they really don't provide anything besides simple glue we should be able to support "forever"
  • let the more complex issue about how readers find their illustration untouched, it is still unclear if we need a common solution to that or should let each readers find their way through this based on their needs (e.g. when a reader would prefer a 96x96@1 and it does not exists but 48x48@2 and 126x126@1 do exist, do we return both, only biggest res, ...)

WDYT?

@rgaudin
Copy link
Member

rgaudin commented Nov 10, 2025

I'm fine with this simple glue strategy. It's dumb enough to not be a problem, as you wrote.

Regarding the more complex reader usage, there's a ticket for that on libzim, it will definitely be implemented there but first step was to allow addition of multiple scales and width.

@benoit74
Copy link
Contributor Author

Hum, I'm afraid that in fact this simple glue strategy is not that simple, because it will break the API or introduce dirty glue code for add_illustration.

Currently second argument to add_illustration is the content. If content can be either second or third or fourth depending on other arguments, this is either brittle (we count arguments to know where content is) or breaking (we force content to be a named argument).

Do you have suggestion about how to handle this? Should we introduce different method names? I'm puzzled

Same kind of issue with the scale in get_illustration_item(width, height, scale) and get_illustration_item(size, scale), should we have scale first or should we named arguments or something else?

@rgaudin
Copy link
Member

rgaudin commented Nov 11, 2025

Well libzim 9.4.0 did break the API (version doesn't tell because they don't respect semver) so I see no reason we wouldn't allow ourselves to.

Also, it's a binding not a pure copy which it can't be, especially since python doesn't support overloading natively. To me it would only make sense to have the content the first param and the rest afterwards. It could even be a single func with width_or_squaresize, height, scale.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use new libzim illustration API

3 participants