-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[RFC] Add subrandr SRV3 subtitle renderer #16271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Excellent job! This works almost perfectly on the few videos I tried. Besides the few limitations you've listed,
This is actually not necessary, you can set PREFIX to any directory then point meson to the pkgconfig file. For example I installed it to |
|
Download the artifacts for this pull request: Windows |
800fc48 to
c6e4b4f
Compare
By "Nothing" I meant that we could "do nothing" not that there's no alternatives, there wouldn't be an alternatives section with alternatives if there weren't alternatives. Looking at ffmpeg-devel I don't see any patches for improving styling support, ffmpeg supported simple webvtt throwing out Also I fixed the stride hack so now the resulting |
Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too. |
Well if that's the case that would make things easier. After downloading many dlls from msys packages I even got it to compile, run, and find the config file in wine but it doesn't find any fonts. I'm hoping that this actually does work on Windows but someone on Windows would have to actually check that. In particular I have no idea what encoding fontconfig returns in FC_FILE on Windows and am currently wishfully assuming it's UTF-8. |
|
Implemented WebVTT snap-to-lines = false layout and Unicode line breaking. This means I'm now slightly less afraid of breaking people's WebVTT rendering. Is there anything to do on the MPV side before this is ready for review? I was thinking that it may be confusing if people using very customized subtitle options have their customization ignored in WebVTT (because sd_sbr doesn't implement them). Maybe initially we could just use subrandr for SRV3 to not cause regressions with WebVTT if subrandr is enabled, though there is still |
demux/demux_textsub.c
Outdated
| {NULL} | ||
| }; | ||
|
|
||
| static const int SUBRANDR_PROBE_SIZE = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it makes much sense, but mpv generally allow to configure such parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it could be done but there's not really a point since the probing should identify the subtitle from the first line alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you. we tend to avoid hardcoded params where possible though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I've made this configurable now. Wondering whether the demuxer should also accept an option to disable forcing itself on you and let you fallback to lavf+libass, not sure how to call it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering whether the demuxer should also accept an option to disable forcing itself
For main demuxer we have --demuxer, so if it's something you want to support, you could add --sub-demuxer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--sub-demuxer already exists so I guess that does mostly address that although you won't be able to use other demuxers if you set it to lavf-only. Now that I think about it the best way to address my concern would probably be a more general --{,sub-,audio-}demuxer-blacklist option. I'll leave that as future work since --sub-demuxer does indeed address most of the use case.
One thing I noticed is that if --sub-demuxer=lavf is set then playing a video with ytdl_hook will unconditionally fail to load subs, even if --ytdl-raw-options-add=sub-format=vtt is set. Note that this is the case even on mpv v0.40.0 (without subrandr).
289db3e to
bc52f76
Compare
|
Bumping this, I've since improved subrandr's line-height handling and ruby positioning, also am in the process of implementing a mini web engine to match browser styling more precisely (along with improving the layout subsystem on master in the process). |
|
Sorry for the delay, I might have some time this weekend to look at this. On thing that stands out is, how we streamline building and shipping new library. I was thinking of defining dummy |
I looked into doing the It just seems like a decent amount of work and it would have to be updated whenever something linking-related is changed. I can try but I just don't know whether I want to maintain all that honestly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except few minor nitpicks, LGTM. I have not tested it as I wouldn't know what to test with VTT, but as far as integration in mpv is concerned it is fine.
As discussed on IRC, remaining thing is to make this somehow accessible. As likely mpv will be a main user of this at least at the beginning, building it on CI would be good to make it available in our test builds.
|
I added subrandr to the CI builds for Linux, x86_64 mingw64, and x86_64 msys2. Currently i686 and aarch64 Windows targets don't work which I track on my side in afishhh/subrandr#31. |
|
Fixed ARM64 implibs in implib-rs upstream, so aarch64 msys2 can now be tested. |
|
Right makes sense the weird mutual exclusivity of these options could cause issues, although they do seem pretty niche in this case. I was meaning to add |
Not really. This is not even my main account ;) |
|
Along the lines of #16271 (comment), I'm thinking of maybe removing WebVTT from this PR. That would mean it no longer fixes #16456 but we could come back to that later with an opt-in solution. Sounds like the safest approach, thoughts? |
|
I think it's fine as is. We can figure out what to do about that later.
|
I consider unable to style subtitles a regression.
A generic option to costomize demuxer or sd probe list like for vo/gpu-api/gpu-context will solve this while also being more flexible.
If you really agree with this, then it should be disabled by default in |
Yes and we can figure that out later, before the next release.
Well that won't get rid of
Sure why not. |
|
I think it would be better to remove WebVTT for now since I truly don't care enough about it and na-na-hi is right. |
560f2bd to
e008a3c
Compare
sry it took so long to test. it can't find freetype, but harfbuzz and fontconfig are okay with that commit. i think i should have been more specific with my first comment, eg |
|
I removed the |
subrandr is a subtitle rendering library which aims to render SRV3 (YouTube) subtitles and WebVTT subtitles accurately. Currently in mpv WebVTT subs are rendered via ffmpeg conversion to ASS which throws away a lot of the style and completely disregards the WebVTT non-region-cue positioning algorithm. Furthermore if one wants to render some more complex SRV3 subtitles one has to resort to external converters since it's not even supported by ffmpeg. However subrandr is able to render SRV3 subtitles natively with support for the most commonly used features. It can render ruby text without relying on font metrics during conversion which is obviously fragile, and it can perform correct scaling using the exact calculations used by YouTube instead of making up ASS approximations. Similarly it follows the WebVTT spec for the features of WebVTT that it supports (mostly). It's not perfect of course and there's still many things it doesn't do or does wrong but those are things that can be incrementally improved outside of mpv.
Allows script to detect the presence of subrandr at runtime, useful for determining whether this mpv instance can play SRV3 subtitles.
This allows YouTube videos played directly from a URL to make use of subrandr's SRV3 support if it is allowed by an overriden `sub-format`.
|
Realized that Also just realized there are merge conflicts, I'll wait with rebasing for now since this is blocked on a release anyway. |
This PR adds support for subrandr, a subtitle rendering library I've been working on for, uh, the past 7 months.
The whole point is to render non-ASS subtitle formats correctly, without conversion, because conversion is most of the time lossy. Currently the supported formats are SRV3 which is YouTube's subtitle format and WebVTT.
Results
I have collected a few videos that use more complex SRV3 subtitles while working on subrandr, so I spent some time making three funny dwm four way comparisons between:
I believe supports ruby text via manual layout with font metrics at conversion time(I don't know whether this is actually the case?), this approach is obviously fragile with font fallback in the mix so personally I don't consider it a real solution.Comparisons of example videos
【original anime MV】幽霊船戦【hololive/宝鐘マリン】

Hololive music videos often have ruby text and as such are decent testing material. subrandr should handle SRV3 ruby text correctly although it's not implemented for WebVTT yet.
Worst Teambuilding Exercise Ever

This video contains a lot of positioned subtitles with different types of text shadow, and at this particular moment also exercises line-wrapping, which for SRV3 should be greedy.
sodapoppin checks out Northernlion's stream

This one is not that special and the ASS conversion is decently close, but the positioning is of course incorrect because it is fully in the video frame and the font size is similarly just slightly off.
For the sake of completeness, the process I used to create the comparisons
Each quadrant is either an mpv or firefox window, ran under X11 with the
dwmwindow manager in master layout withnmasters = 2, with the windows being constructed as follows:Top left: Use mpv compiled with subrandr and play a downloaded copy of the video with an accompanying srv3 file. (
--sub-format srv3in ytdl)Top right: Convert srv3 file to ass file via ffmpeg (
ffmpeg -i <in>.srv3 <out>.ass) then play the video with mpv and switch to the ASS track. (requires this fork of ffmpeg)Bottom left: Play a downloaded copy of the video with an accompanying vtt file downloaded from YouTube.
Bottom right:
userChrome.css.I could've probably made separate screenshots in fullscreen mode and then stitched them together inside a markdown table... whatever, this is the first thing I thought of and it probably looks better.
Limitations
Since the library is still in "early" stages there's a lot of things that are not done correctly yet, this is a non-exhaustive list of the most important such things:
No DirectWrite font provider. (no fonts will be found on Windows)Line-breaking is very naive and does unnecessary reshaping, lines are only broken on whitespace instead of following the Unicode line breaking algorithm.Fixed.Unicode bidirectional algorithm is not used.Implemented.Font selection is not compliant with the CSS font matching algorithm, this has been mostly implemented in a branch but is not yet finished.I have since learned that chromium does something pretty close to what I do, so it's staying.Subpixel glyph rendering is not implemented so positions are rounded to integer values, this looks wrong on non-HiDPI displays in lower window sizes. I plan to fix this soon.There are like two places where it can panic because of unimplemented things but that's easily fixable.Fixed.The lack of a font provider means that the library will immediately return an error from
sbr_renderer_renderas soon as it tries to render text, so it's not currently usable on MacOS without fontconfig.mpv integration unresolved issues
I changedSolved withytdl-hookto request srv3 subtitles, however this is not gated behind conditional compilation or any sort of runtime check so it breaks in builds without subrandr. Maybe a runtime property could be added that Lua can read?subrandr-versionproperty.dpiof 72, this shouldn't have impact on subtitle layout with the currently supported formats, but it does impact debug UI when enabled viaSBR_DEBUG=draw_version,draw_perf,draw_layoutand may break in the future if support for CSS in WebVTT is added since one could do::cue { text-size: 20px; }which must be scaled by the device pixel ratio.I have no idea how to get dpi information in
get_bitmapswithout digging intompv_globalwhich contains a warning specifically telling you not to do that.Oh and I almost forgot, currently I forcefully un-align the stride of the resultingFixed.mp_imagewhich could probably cause issues down the line on some platforms, so that should probably be changed.Building
So if you got this far and are on Linux with FreeType, HarfBuzz (with FreeType support), and Fontconfig libraries installed, here's how you build and install the library:
You also need Rust installed, the latest stable toolchain should work.
The prefix should be set to a writable path where the library should be installed, it will create the following filesystem structure there:
Then you need to make sure
<prefix>/lib/pkgconfigis on your pkg-config path when runningmeson setup(for example via the--pkg-config-pathmeson arg).After building the library itself, you should be able to build mpv as usual, by passing
-D subrandr=enabledtomeson setupyou can ensure the library is correctly detected or you will get a build error.The library itself is linked statically, with mpv inheriting the dynamic library dependencies.Static linking is never a good idea somehow.Alternatives
Is this all necessary? Other possible approaches could be:
Naturally, after spending months working on this, I am slightly biased and believe a separate renderer is worth because it allows iterating on other subtitle formats without having to worry about the unhinged format known as Advanced Substation Alpha. At first I was developing ASS support in parallel to SRV3 in subrandr, but then realized how much horrible complexity ASS adds and purged it from the code base, this in my eyes confirmed that it is significantly simpler to have other formats handled separately.
Thanks for reading, hope you like my work :)