Skip to content

Conversation

@ephphatha
Copy link

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent#library_and_net_tool_ua_strings provides a few examples, also see urllib which uses "Python-urllib/".

img2dataset does not parse HTML so has no reason to pass a user-agent that indicates mozilla compatibility.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent#library_and_net_tool_ua_strings provides a few examples, also see urllib which uses "Python-urllib/<version>".

img2dataset does not parse HTML so has no reason to pass a user-agent that indicates mozilla compatibility.
key, url = row
img_stream = None
user_agent_string = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"
user_agent_string = "img2dataset/1.x ("
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not into use {user_agent_token} rather than hard coding img2dataset here?

Copy link
Author

@ephphatha ephphatha Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to the repository was hardcoded previously if any user-agent was specified, so it seemed appropriate to use it as the base tool name with the user-provided string added in the comment section.

edit: actually double-checking main() it looks like the default useragent token is None, not "img2dataset" as I thought for some reason. The old default UA does not identify the tool at all.
default UA: img2dataset/1.x (+https://github.com/rom1504/img2dataset)
user-provided UA: img2dataset/1.x (compatible; <user-provided>; +https://github.com/rom1504/img2dataset)

previous strings were:
default UA: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
user-provided UA: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0 (compatible; <user-provided>; +https://github.com/rom1504/img2dataset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Waiting for user input

Development

Successfully merging this pull request may close these issues.

3 participants