Skip to content

Conversation

@SpencerIsGiddy
Copy link
Contributor

According to how nextdns puts it. This blocklist would “block domains that impersonate other domains by abusing the large character set made available with the arrival of Internationalized Domain Names (IDNs) — e.g. replacing the Latin letter "e" with the Cyrillic letter "е".”

seems to differ from typosquatting due to one focusing on different characters and the other focusing on spelling something wrong. Eg. gooogle vs gôogle

@ignoramous
Copy link
Contributor

Thanks, but I think this list won't parse (the parser expects valid DNS characters (a-z 0-9 . -). I am surprised that it works as-is with other DNS content-blockers. I'd ideally expect a hostfile to be puny-encoded. Perhaps we should ask cbuijs if any other project has needed it puny-encoded...

alternatively, we can ourselves puny-encode all files before parsing (ie, before inserting it in to the trie; or, at the time of saving the downloaded files).

@ignoramous ignoramous self-requested a review June 8, 2023 01:33
@cbuijs
Copy link

cbuijs commented Aug 24, 2023

Sorry for late reply, just seeing this.

DNS does support it (IDN)!

The names need to be (or are) converted to punycode (the funny looking names with xn-- in it), so it only contain DNS Characters.

Browers and other apps that support IDN will convert (ACE) the names to punycode to resolve them in DNS.

So from the DNS side, to block, you only need to add the xn-- version of the names.

Check the IDN2 tool to do the conversions if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants