Skip to content

Commit db6baee

Browse files
committed
Provide mechanism for Julia syntax evolution
# Motivation There are several corner cases in the Julia syntax that are essentially bugs or mistakes that we'd like to possibly remove, but can't due to backwards compatibility concerns. Similarly, when adding new syntax features, there are often cases that overlap with valid (but often nonsensical) existing syntax. In the past, we've mostly done judegement calls of these being "minor changes", but as the package ecosystem grows, so does the chance of someone accidentally using these anyway and our "minor changes" have (subjectively) resulted in more breakages recently. Fortunately, all the recent work on making the parser replacable, combined with the fact that JuliaSyntax already supports parsing multiple revisions of Julia syntax provides a solution here: Just let packages declare what version of the Julia syntax they are using. That way, packages would not break if we make changes to the syntax and they can be upgraded at their own pace the next time the author of that particular package upgrades to a new julia version. # Core mechanism The way this works is simple. Right now, the parser function is always looked up in `Core._parse`. With this PR, it is instead looked up as `mod._internal_julia_parse` (slightly longer name to avoid conflicting with existing bindings of the name in downstream packages), or `Core._parse` if no such binding exists. Similar for `_lower`. There is a macro `@Base.Experimental.set_syntax_version v"1.xx"` that will set the `_internal_julia_parse` (and inte the future the _lower version) to one that propagates the version to the parser, so users are not expected to manipulate the binding directly. # Versioned package loading The loading system is extended to look at a new `syntax.julia_version` key in Project.toml (and Manifest for explicit environments). If no such key exists, it defaults to the minimum allowed version of the Julia compat. If no compat is defined, it defaults to the current Julia version. This is technically slightly less backwards compatible than defaulting this to Julia 1.13, but I think it will be less suprising in the future for the default syntax to match what is in the REPL. Most julia packages do already define a julia compat. Note that as a result of this, the code for parse compat ranges moves from Pkg to Base. # Syntax changes This introduces two parser changes: 1. `@VERSION` (and similar macrocall forms of a macro named `VERSION`) are now special and trigger the parser to push its version information into the source location field of the macrocall. Note that because this is in the parser, this affects all macros with the name. However, there is also logic on the macrocall side that discards this again if the macro cannot accept it. This special mechanism is used by the `Base.Experimental.@VERSION` macro to let users detect the parse version. 2. The `module` syntax form gains a syntax version argument that is automatically populated with the parser's current version. This is the mechanism to propagate syntax information from the parser to the core mechanism above. Note that these are only active if a module has opted into 1.14 syntax, so macros that process `:module` exprs will not see these changes unless and until the calling module opts into 1.14 syntax via the above mentioned mechanisms (which is the primary advantage of this scheme). # Final words I should emphasize that I'm not proposing using this for any big syntax revolutions or anything. I would just like to start cleaning up a few corners of the syntax that I think are universally agreed to be bad but that we've kept for backwards compatibility. This way, by the time we get around to making a breaking revision, our entire ecosystem will have already upgraded to the new syntax.
1 parent 4060c45 commit db6baee

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1180
-228
lines changed

JuliaLowering/src/ast.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ function makeleaf(ctx, srcref, k::Kind, value; kws...)
175175
k == K"Char" ? convert(Char, value) :
176176
k == K"Value" ? value :
177177
k == K"Bool" ? value :
178+
k == K"VERSION" ? value :
178179
error("Unexpected leaf kind `$k`")
179180
makeleaf(graph, srcref, k; value=val, kws...)
180181
end

JuliaLowering/src/compat.jl

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,9 @@ function _insert_convert_expr(@nospecialize(e), graph::SyntaxGraph, src::SourceA
211211
id_inner = _insert_tree_node(graph, K"String", src; value=e)
212212
setchildren!(graph, st_id, [id_inner])
213213
return st_id, src
214+
elseif e isa VersionNumber
215+
st_id = _insert_tree_node(graph, K"VERSION", src, JS.set_numeric_flags(e.minor*10); value=e)
216+
return st_id, src
214217
elseif !(e isa Expr)
215218
# There are other kinds we could potentially back-convert (e.g. Float),
216219
# but Value should work fine.
@@ -398,11 +401,14 @@ function _insert_convert_expr(@nospecialize(e), graph::SyntaxGraph, src::SourceA
398401
child_exprs[2] = maybe_unwrap_arg(e.args[2])
399402
end
400403
elseif e.head === :module
401-
@assert nargs === 3
402-
if !e.args[1]
404+
@assert nargs in (3, 4)
405+
has_version = !isa(e.args[1], Bool)
406+
if !e.args[1+has_version]
403407
st_flags |= JS.BARE_MODULE_FLAG
404408
end
405-
child_exprs = Any[e.args[2], e.args[3]]
409+
child_exprs = has_version ?
410+
Any[e.args[1], e.args[2+has_version], e.args[3+has_version]] :
411+
Any[e.args[2+has_version], e.args[3+has_version]]
406412
elseif e.head === :do
407413
# Expr:
408414
# (do (call f args...) (-> (tuple lam_args...) (block ...)))

JuliaLowering/src/eval.jl

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -78,19 +78,26 @@ function lower_step(iter, push_mod=nothing)
7878
push!(iter.todo, (ex, false, 1))
7979
return lower_step(iter)
8080
elseif k == K"module"
81-
name = ex[1]
81+
name_or_version = ex[1]
82+
version = nothing
83+
if kind(name_or_version) == K"VERSION"
84+
version = name_or_version.value
85+
name = ex[2]
86+
else
87+
name = name_or_version
88+
end
8289
if kind(name) != K"Identifier"
8390
throw(LoweringError(name, "Expected module name"))
8491
end
8592
newmod_name = Symbol(name.name_val)
86-
body = ex[2]
93+
body = ex[end]
8794
if kind(body) != K"block"
8895
throw(LoweringError(body, "Expected block in module body"))
8996
end
9097
std_defs = !has_flags(ex, JuliaSyntax.BARE_MODULE_FLAG)
9198
loc = source_location(LineNumberNode, ex)
9299
push!(iter.todo, (body, true, 1))
93-
return Core.svec(:begin_module, newmod_name, std_defs, loc)
100+
return Core.svec(:begin_module, version, newmod_name, std_defs, loc)
94101
else
95102
# Non macro expansion parts of lowering
96103
ctx2, ex2 = expand_forms_2(iter.ctx, ex)
@@ -480,9 +487,9 @@ function _eval(mod, iter)
480487
break
481488
elseif type == :begin_module
482489
push!(modules, mod)
483-
filename = something(thunk[4].file, :none)
484-
mod = @ccall jl_begin_new_module(mod::Any, thunk[2]::Symbol, thunk[3]::Cint,
485-
filename::Cstring, thunk[4].line::Cint)::Module
490+
filename = something(thunk[5].file, :none)
491+
mod = @ccall jl_begin_new_module(mod::Any, thunk[3]::Symbol, thunk[2]::Any, thunk[4]::Cint,
492+
filename::Cstring, thunk[5].line::Cint)::Module
486493
new_mod = mod
487494
elseif type == :end_module
488495
@ccall jl_end_new_module(mod::Module)::Cvoid
@@ -510,10 +517,11 @@ function _eval(mod, iter, new_mod=nothing)
510517
@assert !in_new_mod
511518
break
512519
elseif type == :begin_module
513-
name = thunk[2]::Symbol
514-
std_defs = thunk[3]
520+
version = thunk[2]
521+
name = thunk[3]::Symbol
522+
std_defs = thunk[4]
515523
result = Core.eval(mod,
516-
Expr(:module, std_defs, name,
524+
Expr(:module, (version === nothing ? () : (version,))..., std_defs, name,
517525
Expr(:block, thunk[4], Expr(:call, m->_eval(m, iter, m), name)))
518526
)
519527
elseif type == :end_module

JuliaLowering/test/macros.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,7 @@ end
431431
)
432432
end
433433
""") ≈ @ast_ [K"module"
434+
v"1.14.0"::K"VERSION"
434435
"AA"::K"Identifier"
435436
[K"block"
436437
]

JuliaSyntax/src/integration/expr.jl

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ end
199199

200200
function parseargs!(retexpr::Expr, loc::LineNumberNode, cursor, source, txtbuf::Vector{UInt8}, txtbuf_offset::UInt32)
201201
args = retexpr.args
202-
firstchildhead = head(cursor)
202+
firstchildhead = secondchildhead = head(cursor)
203203
firstchildrange::UnitRange{UInt32} = byte_range(cursor)
204204
itr = reverse_nontrivia_children(cursor)
205205
r = iterate(itr)
@@ -208,11 +208,12 @@ function parseargs!(retexpr::Expr, loc::LineNumberNode, cursor, source, txtbuf::
208208
r = iterate(itr, state)
209209
expr = node_to_expr(child, source, txtbuf, txtbuf_offset)
210210
@assert expr !== nothing
211+
secondchildhead = firstchildhead
211212
firstchildhead = head(child)
212213
firstchildrange = byte_range(child)
213214
pushfirst!(args, fixup_Expr_child(head(cursor), expr, r === nothing))
214215
end
215-
return (firstchildhead, firstchildrange)
216+
return (firstchildhead, secondchildhead, firstchildrange)
216217
end
217218

218219
_expr_leaf_val(node::SyntaxNode, _...) = node.val
@@ -235,6 +236,9 @@ function node_to_expr(cursor, source, txtbuf::Vector{UInt8}, txtbuf_offset::UInt
235236
return k == K"error" ?
236237
Expr(:error) :
237238
Expr(:error, "$(_token_error_descriptions[k]): `$(source[srcrange])`")
239+
elseif k == K"VERSION"
240+
nv = numeric_flags(flags(nodehead))
241+
return VersionNumber(1, nv ÷ 10, nv % 10)
238242
else
239243
scoped_val = _expr_leaf_val(cursor, txtbuf, txtbuf_offset)
240244
val = @isexpr(scoped_val, :scope_layer) ? scoped_val.args[1] : scoped_val
@@ -292,10 +296,11 @@ function node_to_expr(cursor, source, txtbuf::Vector{UInt8}, txtbuf_offset::UInt
292296
end
293297

294298
# Now recurse to parse all arguments
295-
(firstchildhead, firstchildrange) = parseargs!(retexpr, loc, cursor, source, txtbuf, txtbuf_offset)
299+
(firstchildhead, secondchildhead, firstchildrange) =
300+
parseargs!(retexpr, loc, cursor, source, txtbuf, txtbuf_offset)
296301

297302
return _node_to_expr(retexpr, loc, srcrange,
298-
firstchildhead, firstchildrange,
303+
firstchildhead, secondchildhead, firstchildrange,
299304
nodehead, source)
300305
end
301306

@@ -318,7 +323,7 @@ end
318323
# tree types.
319324
@noinline function _node_to_expr(retexpr::Expr, loc::LineNumberNode,
320325
srcrange::UnitRange{UInt32},
321-
firstchildhead::SyntaxHead,
326+
firstchildhead::SyntaxHead, secondchildhead::SyntaxHead,
322327
firstchildrange::UnitRange{UInt32},
323328
nodehead::SyntaxHead,
324329
source)
@@ -355,6 +360,11 @@ end
355360
# Fix up for custom cmd macros like foo`x`
356361
args[2] = a2.args[3]
357362
end
363+
if kind(secondchildhead) == K"VERSION"
364+
# Encode the syntax version into `loc` so that the argument order
365+
# matches what ordinary macros expect.
366+
loc = Core.MacroSource(loc, popat!(args, 2))
367+
end
358368
end
359369
do_lambda = _extract_do_lambda!(args)
360370
_reorder_parameters!(args, 2)
@@ -554,8 +564,8 @@ end
554564
pushfirst!((args[2]::Expr).args, loc)
555565
end
556566
elseif k == K"module"
557-
pushfirst!(args, !has_flags(nodehead, BARE_MODULE_FLAG))
558-
pushfirst!((args[3]::Expr).args, loc)
567+
insert!(args, kind(firstchildhead) == K"VERSION" ? 2 : 1, !has_flags(nodehead, BARE_MODULE_FLAG))
568+
pushfirst!((args[end]::Expr).args, loc)
559569
elseif k == K"quote"
560570
if length(args) == 1
561571
a1 = only(args)

JuliaSyntax/src/integration/hooks.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ end
162162
# Debug log file for dumping parsed code
163163
const _debug_log = Ref{Union{Nothing,IO}}(nothing)
164164

165-
function core_parser_hook(code, filename::String, lineno::Int, offset::Int, options::Symbol)
165+
function core_parser_hook(code, filename::String, lineno::Int, offset::Int, options::Symbol; syntax_version = v"1.13")
166166
try
167167
# TODO: Check that we do all this input wrangling without copying the
168168
# code buffer
@@ -184,7 +184,7 @@ function core_parser_hook(code, filename::String, lineno::Int, offset::Int, opti
184184
write(_debug_log[], code)
185185
end
186186

187-
stream = ParseStream(code, offset+1)
187+
stream = ParseStream(code, offset+1; version = syntax_version)
188188
if options === :statement || options === :atom
189189
# To copy the flisp parser driver:
190190
# * Parsing atoms consumes leading trivia

JuliaSyntax/src/julia/kinds.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,7 @@ register_kinds!(JuliaSyntax, 0, [
247247
"public"
248248
"type"
249249
"var"
250+
"VERSION"
250251
"END_CONTEXTUAL_KEYWORDS"
251252
"END_KEYWORDS"
252253

JuliaSyntax/src/julia/literal_parsing.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,9 @@ function parse_julia_literal(txtbuf::Vector{UInt8}, head::SyntaxHead, srcrange)
408408
return had_error ? ErrorVal() : String(take!(io))
409409
elseif k == K"Bool"
410410
return txtbuf[first(srcrange)] == u8"t"
411+
elseif k == K"VERSION"
412+
nv = numeric_flags(head)
413+
return VersionNumber(1, nv ÷ 10, nv % 10)
411414
end
412415

413416
# TODO: Avoid allocating temporary String here

JuliaSyntax/src/julia/parser.jl

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1488,13 +1488,23 @@ function parse_unary_prefix(ps::ParseState, has_unary_prefix=false)
14881488
end
14891489
end
14901490

1491-
function maybe_parsed_macro_name(ps, processing_macro_name, mark)
1491+
function maybe_parsed_macro_name(ps, processing_macro_name, last_identifier_orig_kind, mark)
14921492
if processing_macro_name
14931493
emit(ps, mark, K"macro_name")
1494+
maybe_parsed_special_macro(ps, last_identifier_orig_kind)
14941495
end
14951496
return false
14961497
end
14971498

1499+
function maybe_parsed_special_macro(ps, last_identifier_orig_kind)
1500+
is_syntax_version_macro = last_identifier_orig_kind == K"VERSION"
1501+
if is_syntax_version_macro && ps.stream.version >= (1, 14)
1502+
# Encode the current parser version into an invisible token
1503+
bump_invisible(ps, K"VERSION",
1504+
set_numeric_flags(ps.stream.version[2] * 10))
1505+
end
1506+
end
1507+
14981508
# Parses a chain of suffixes at function call precedence, leftmost binding
14991509
# tightest. This handles
15001510
# * Bracketed calls like a() b[] c{}
@@ -1543,7 +1553,7 @@ function parse_call_chain(ps::ParseState, mark, is_macrocall=false)
15431553
# @+x y ==> (macrocall (macro_name +) x y)
15441554
# [email protected] ==> (macrocall (. A (macro_name .)) x)
15451555
processing_macro_name = maybe_parsed_macro_name(
1546-
ps, processing_macro_name, mark)
1556+
ps, processing_macro_name, last_identifier_orig_kind, mark)
15471557
let ps = with_space_sensitive(ps)
15481558
# Space separated macro arguments
15491559
# A.@foo a b ==> (macrocall (. A (macro_name foo)) a b)
@@ -1577,7 +1587,7 @@ function parse_call_chain(ps::ParseState, mark, is_macrocall=false)
15771587
# (a=1)() ==> (call (parens (= a 1)))
15781588
# f (a) ==> (call f (error-t) a)
15791589
processing_macro_name = maybe_parsed_macro_name(
1580-
ps, processing_macro_name, mark)
1590+
ps, processing_macro_name, last_identifier_orig_kind, mark)
15811591
bump_disallowed_space(ps)
15821592
bump(ps, TRIVIA_FLAG)
15831593
opts = parse_call_arglist(ps, K")")
@@ -1598,7 +1608,7 @@ function parse_call_chain(ps::ParseState, mark, is_macrocall=false)
15981608
end
15991609
elseif k == K"["
16001610
processing_macro_name = maybe_parsed_macro_name(
1601-
ps, processing_macro_name, mark)
1611+
ps, processing_macro_name, last_identifier_orig_kind, mark)
16021612
m = position(ps)
16031613
# a [i] ==> (ref a (error-t) i)
16041614
bump_disallowed_space(ps)
@@ -1666,7 +1676,7 @@ function parse_call_chain(ps::ParseState, mark, is_macrocall=false)
16661676
if is_macrocall
16671677
# Recover by pretending we do have the syntax
16681678
processing_macro_name = maybe_parsed_macro_name(
1669-
ps, processing_macro_name, mark)
1679+
ps, processing_macro_name, last_identifier_orig_kind, mark)
16701680
# @M.(x) ==> (macrocall (dotcall (macro_name M) (error-t) x))
16711681
bump_invisible(ps, K"error", TRIVIA_FLAG)
16721682
emit_diagnostic(ps, mark,
@@ -1720,6 +1730,7 @@ function parse_call_chain(ps::ParseState, mark, is_macrocall=false)
17201730
macro_atname_range = (m, position(ps))
17211731
is_macrocall = true
17221732
emit(ps, mark, K".")
1733+
maybe_parsed_special_macro(ps, last_identifier_orig_kind)
17231734
elseif k == K"'"
17241735
# f.' => (dotcall-post f (error '))
17251736
bump(ps, remap_kind=K"Identifier") # bump '
@@ -1760,7 +1771,7 @@ function parse_call_chain(ps::ParseState, mark, is_macrocall=false)
17601771
emit(ps, mark, K"call", POSTFIX_OP_FLAG)
17611772
elseif k == K"{"
17621773
processing_macro_name = maybe_parsed_macro_name(
1763-
ps, processing_macro_name, mark)
1774+
ps, processing_macro_name, last_identifier_orig_kind, mark)
17641775
# Type parameter curlies and macro calls
17651776
m = position(ps)
17661777
# S {a} ==> (curly S (error-t) a)
@@ -2065,6 +2076,13 @@ function parse_resword(ps::ParseState)
20652076
# module do \n end ==> (module (error do) (block))
20662077
bump(ps, error="Invalid module name")
20672078
else
2079+
if ps.stream.version >= (1, 14)
2080+
# Encode the parser version that parsed this module - the runtime
2081+
# will use this to set the same parser version for runtime `include`
2082+
# etc into this module.
2083+
bump_invisible(ps, K"VERSION",
2084+
set_numeric_flags(ps.stream.version[2] * 10))
2085+
end
20682086
# module $A end ==> (module ($ A) (block))
20692087
parse_unary_prefix(ps)
20702088
end

JuliaSyntax/src/julia/tokenize.jl

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1245,12 +1245,12 @@ function lex_identifier(l::Lexer, c)
12451245
end
12461246
end
12471247

1248-
# This creates a hash for chars in [a-z] using 5 bit per char.
1248+
# This creates a hash for chars in [A-z] using 6 bit per char.
12491249
# Requires an additional input-length check somewhere, because
1250-
# this only works up to ~12 chars.
1250+
# this only works up to ~10 chars.
12511251
@inline function simple_hash(c::Char, h::UInt64)
1252-
bytehash = (clamp(c - 'a' + 1, -1, 30) % UInt8) & 0x1f
1253-
h << 5 + bytehash
1252+
bytehash = (clamp(c - 'A' + 1, -1, 60) % UInt8) & 0x3f
1253+
h << 6 + bytehash
12541254
end
12551255

12561256
function simple_hash(str)
@@ -1305,10 +1305,11 @@ K"outer",
13051305
K"primitive",
13061306
K"type",
13071307
K"var",
1308+
K"VERSION"
13081309
]
13091310

13101311
const _true_hash = simple_hash("true")
13111312
const _false_hash = simple_hash("false")
1312-
const _kw_hash = Dict(simple_hash(lowercase(string(kw))) => kw for kw in kws)
1313+
const _kw_hash = Dict(simple_hash(string(kw)) => kw for kw in kws)
13131314

13141315
end # module

0 commit comments

Comments
 (0)