Why do we not use `commoncrawl` indices, and then possibly build upon them?

I do not understand much about search engines, so I was reading about them. Then I stumbled upon `commoncrawl`. I know that stract uses it's own crawler, but I have found the index still smaller than I would like. I also searched commoncrawl in github issues, and found 2 issues, where it has been recommended to the local hosters to use the commoncrawl's warc files. So why does not stract use them? Are they lacking in something that I do not know if, or is there a limit in using them (like not to be used for commercial projects (I hope that is not the case, since they used `can be used by everyone` multiple times on their pages)), or is it purely a choice based on quality or some other thing (maybe the averge result quality is not that good, or does not meet stracts expectation in the data/metadata provided).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why do we not use `commoncrawl` indices, and then possibly build upon them? #263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why do we not use commoncrawl indices, and then possibly build upon them? #263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why do we not use `commoncrawl` indices, and then possibly build upon them? #263