Skip to content

Conversation

LightWind1
Copy link

#376
I add three regular expression to match Chinese, Japanese, Korean words .
Now it can tokenize sql correctly like 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'

@andialbrecht andialbrecht self-assigned this Mar 5, 2024
@andialbrecht
Copy link
Owner

Hi @LightWind1, can you clarify what problem your change solves?
I've had a look on how the parser sees your statement and to me everything looks as expected:

import sqlparse
sql = 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'
p = sqlparse.parse(sql)[0]
p._pprint_tree()
|- 0 DML 'select'
|- 1 Whitespace ' '
|- 2 IdentifierList 'T2.名称 ...'
|  |- 0 Identifier 'T2.名称'
|  |  |- 0 Name 'T2'
|  |  |- 1 Punctuation '.'
|  |  `- 2 Name '名称'
|  |- 1 Whitespace ' '
|  |- 2 Punctuation ','
.....and so on.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants