page can not be crawled due to robots.txt

We have set up a ggl crawling on demo.pygeoapi.io to research crawler behaviour on pygeoapi. First results are available, but it puzzels me a bit. 

Ggl generally crawls pygeoapi pages in a correct way. One can indeed find demo pygeoapi results at for example https://www.google.com/search?q=site%3Ademo.pygeoapi.io. however no results yet at https://toolbox.google.com/datasetsearch/search?query=site%3Ademo.pygeoapi.io
 
A weird thing is that when doing 'live test' (a feature on ggl search console) on this url https://demo.pygeoapi.io/master/collections/lakes i get this error: "url not available to google, blocked by robots.txt"

![image](https://user-images.githubusercontent.com/299829/58513480-dda6f380-819f-11e9-868f-5d8969ba7000.png)

However https://demo.pygeoapi.io/master/collections/lakes?f=html runs fine in 'live test'. This makes me wonder, does the 'live-test' crawler use the proper accept header?

Another thing to improve is the fact that https://demo.pygeoapi.io/robots.txt does not return a proper robots.txt file, but in stead a custom file-not-found page (with http status 200!)

let me now if you have any ideas 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

page can not be crawled due to robots.txt #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

page can not be crawled due to robots.txt #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions