Skip to content

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented Aug 16, 2022

ref #13338

This is partial progress towards never surfacing image elements in audits if they are associated with a bogus image (such as: src='' being surfaced in an image audit).

This PR does two things:

1 - ImageRecords

This computed artifact now filters out image elements that are probably not a usable image resource. It uses mimeType to do this, but also has an affordance for avif/webp being sent with a borked mimeType. In the future, we should reach into blink/tracing to get more accurate information here.

A follow up PR would add a new audit that details these likely-problematic images, though that should wait for better instrumentation.

We currently don't have network records available to use in snapshot mode, so this PR does nothing for them.

2 - Using ImageRecords more

There are two audits that presently don't support snapshot mode and weren't already utilizing ImageRecords:

  • lcp-lazy-loaded
  • preload-lcp-image

Now they use ImageRecords instead of ImageElements directly.

@connorjclark connorjclark requested a review from a team as a code owner August 16, 2022 00:04
@connorjclark connorjclark requested review from adamraine and removed request for a team August 16, 2022 00:04
function mockElement(partial = {}) {
return {
src: 'https://example.com/img.png',
src: partial.src ?? 'https://example.com/img.png',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this what's happening already? Where partial.src overrides this value if provided?

@brendankenny
Copy link
Contributor

brendankenny commented Aug 17, 2022

This might be going too aggressive, though :/ About 1.6% of sites in HTTP Archive (July 2022) currently have an image LCP which would no longer be detected after this. It is catching a bunch of accidental css files (at least going by filenames), but there also just seem to be a ton of people serving images as application/images, application/png, or good old application/xml.

imageRecords.push({
...element,
mimeType: mimeType ? mimeType : URL.guessMimeType(element.src),
mimeType: networkRecord.mimeType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brendan said:

This might be going too aggressive, though :/ About 1.6% of sites in HTTP Archive (July 2022) currently have an image LCP which would no longer be detected after this. It is catching a bunch of accidental css files (at least going by filenames), but there also just seem to be a ton of people serving images as application/images, application/png, or good old application/xml.

and then I was like... "i thought we were gonna used resourceType but then see #13338 (comment) where you showed an HTML page getting described as resourceType: Image

i tried to reproduce this locally and failed. but then tried <img src="page.html">. Yeah very interesting.

These resourceTypes are defined in the blink loader and set (for non-scripts) here

I guess this type is still in the "loader" before it's determined that it's an invalid asset for the intended use.

Ah well. So you're right about resourceType not being useful for this situation.

humph.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 18, 2022

but there also just seem to be a ton of people serving images as application/images, application/png, or good old application/xml.

Can you produce a list of these faulty mime-types (or is what's listed the vast majority)? As a first pass, without blink instrumentation, we could just add all these bogus things to an allow-list.

That said, I'll dig around in Chrome and see where this instrumentation could be added.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 18, 2022

caseq informed me that the CDT frontend used to display incongruences in image resources by comparing the resourceType (which is the intent/context) with the mimeType (which is post-sniffing). There's also the raw mime type header available, so we can do both to detect known bads.

Need to test what the "post-sniff" CDP mimeType is for a valid image served as application/images–ideally it somehow detects it's all A-OK and shows the real content mimetype, otherwise we must fallback to the known-bads allowlist.

@brendankenny
Copy link
Contributor

Can you produce a list of these faulty mime-types (or is what's listed the vast majority)?

Yes, but I believe the top mimetype that fails this check is text/html :/ '' is also relatively popular.

@connorjclark
Copy link
Collaborator Author

Need to test what the "post-sniff" CDP mimeType is for a valid image served as application/images–ideally it somehow detects it's all A-OK and shows the real content mimetype, otherwise we must fallback to the known-bads allowlist.

Result (for .jpg):

  • if Content-Type header is blank, CDP mimeType is image/jpeg (sniffed), no raw header
  • if Content-Type header is application/images (or whatever), CDP mimeType matches the raw header

in both cases the image displays, and the type is Image.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 18, 2022

We could move forward with this PR but also introduce a new audit for "hey, all these resources used as images need to better identify themselves" in best practices...

If we're really talking about a small subset of misconfigured websites, that seems fine? esp. to avoid the super weird/common-ish <img src=""> that some god-awful PHP spat out (or however else these things happen)

@connorjclark
Copy link
Collaborator Author

@connorjclark
Copy link
Collaborator Author

@brendankenny thoughts on #14295 (comment) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants