Skip to content

Sitemaps should conform to indexers' sitemap expectations #7267

@maxkadel

Description

@maxkadel

Descriptive summary

When I submit a sitemap at /resourcelist to Google, I should not get xml errors

Steps to reproduce the behavior in User Interface (UI)

  1. For a host with Google Search Console enabled, submit a sitemap at https://search.google.com/search-console/sitemaps?resource_id=sc-domain%3AYOUR_DOMAIN.whatever
  2. See the status - it should not have any xml parsing errors

Actual behavior (include screenshots if available)

I have seen this on applications on the 5.0-flexible branch, but I suspect it's true on main as well.

Actual behavior is 2 invalid xml tag errors

Image

Acceptance Criteria/Expected Behavior

  • Should conform to http://www.sitemaps.org/schemas/sitemap/0.9
  • Should not raise xml errors when submitted to Google Search Console
  • If the first two points are mutually exclusive, choose one and document it in the relevant code (most likely the ResourceListWriter)

Rationale (for feature request only)

Good sitemaps will discourage crawling-by-facet, which is a major stress on Solr, and should increase discoverability for everyone using Hyrax.

Related work

On Hyku - samvera/hyku#2765
On Hyrax - has never been implemented, but probably should be - #59

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions