-
Notifications
You must be signed in to change notification settings - Fork 12
Description
We have set up a ggl crawling on demo.pygeoapi.io to research crawler behaviour on pygeoapi. First results are available, but it puzzels me a bit.
Ggl generally crawls pygeoapi pages in a correct way. One can indeed find demo pygeoapi results at for example https://www.google.com/search?q=site%3Ademo.pygeoapi.io. however no results yet at https://toolbox.google.com/datasetsearch/search?query=site%3Ademo.pygeoapi.io
A weird thing is that when doing 'live test' (a feature on ggl search console) on this url https://demo.pygeoapi.io/master/collections/lakes i get this error: "url not available to google, blocked by robots.txt"
However https://demo.pygeoapi.io/master/collections/lakes?f=html runs fine in 'live test'. This makes me wonder, does the 'live-test' crawler use the proper accept header?
Another thing to improve is the fact that https://demo.pygeoapi.io/robots.txt does not return a proper robots.txt file, but in stead a custom file-not-found page (with http status 200!)
let me now if you have any ideas