-
Couldn't load subscription status.
- Fork 75
Add text format specification for Linking.md #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good idea to me. Good to see the annotation proposal being used in innovative ways like this!
Linking.md
Outdated
| | `weak` | sets `WASM_SYM_BINDING_WEAK` symbol flag | | ||
| | `static` | sets `WASM_SYM_BINDING_LOCAL` symbol flag | | ||
| | `hidden` | sets `WASM_SYM_VISIBILITY_HIDDEN` symbol flag | | ||
| | `retain` | sets `WASM_SYM_NO_STRIP` symbol flag | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think retain is a better terminology then we should probably propose renaming WASM_SYM_NO_STRIP rather than diverging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's sound. I just thought that those names are from LLVM's source, so that rename should be done in sync with LLVM
|
For comparison I wonder if you could post a simple |
I can do that for WAT, but since I don't really know the structure of |
|
On second thought, I can just use |
|
@sbc100 My "hello world" examples turned out to be more than a full screen of code when using |
Can you just include it here as quoted text? Along with the equivalent wat for comparison? |
Agreed, Im mostly curious to see the two side by side here in the comments for comparison/discussion. |
|
Oh, sure, then, for the source file test.cpp, when compiled with (Unfortunately, github forbids me from posting |
Oh nice, the wat format looks much more readable than I expected it to be. |
|
One thing I wanted to mention here is that there is a limitation in the text format, that code section relocations that do not make sense for instruction operands may not be expressed in the text format. So for example |
|
Also the same issue exists for labels, which can only point between instructions in the code segment, or into the data area of a data segment in the data section |
Yes it seems perfectly reasonable for an object file validator to declare such relocations as invalid based on the instructions they are part of. However, the linker itself (wasm-ld) will blindly accept such things, and likely produce invalid output as a result too. The linker explictly does not parse the code section but instead blindly applies relocation. The same goes for relocation that don't point to a correct spot in the instruction stream, e.g. a relocation could in theory point at the |
|
Would you be fine with me adding into the doc that such relocations are invalid for the purposes of validation, then? |
Sure that makes sense to me. Relocation that don't point to a valid spot in the instruction stream are certainly invalid. It might be worth also noting that wasm-ld does not do validation of the code section though, so bad inputs can result in bad outputs.
Sounds reasonable yes, we could always expand the list, but for now those are the only sections that are copied by the linker from input files into the output file, so they are the only section for which relocations make sense. I think for GC types we maybe want to one day include the type section somehow, but we are long way from that. |
|
Added the validation rules, please take a look |
Linking.md
Outdated
| | ------------ | -------------- | ------------------------------------------- | | ||
| | section | `varuint32` | the index of the target section | | ||
|
|
||
| Section symbols may only reference the CODE section, the DATA section, or custom sections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. I'm not sure about this actually.
When you asked about documenting a limitation on sections I thought you were referring to the fact that relocations can only apply to certain section types.
"Which sections can have relocations within them" is a different concept to "which sections can be referred to by WASM_SYMBOL_TYPE_SECTION symbols".
I believe that WASM_SYMBOL_TYPE_SECTION symbols are only used by debug info, but my memory is a little foggy here.
Looking at the code it actually looks like this symbols might only be valid for custom sections: https://github.com/llvm/llvm-project/blob/38372df53fd7f6c8bd8c46bf720b676e12f481d9/lld/wasm/InputFiles.cpp#L697-L705.
Which would make sense if these only used in debug info since all debug info is stored in custom sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it can work like that, though, since R_WASM_SECTION_OFFSET_I32 relocations reference a section symbol, and for that to work as DWARF code addresses, the symbol that relocation references would have to reference the CODE section, while the relocation itself would have to target a place in the debug section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually, that can absolutely work if WASM_*_OFFSET_* relocations actually resolve to offsets form the file start, like DWARF actually expects. I do think the current spec is not very clear on this and someone from LLVM should take a look at what actually happens there and adjust https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#processing-relocations accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the text format to reflect that section offset relocations may only reference custom sections for now.
As per current primary symbol rules, the relocation annotation for |
I see so there would still be a reloc entry in the binary but not the the test format. When relocation entries are implicit like this how does the binary writer know to produce them? Would there be special flag to the wat2wasm program that says something like |
Yes, that flag already exists in |
|
The caveat here is that currently |
Do you think this a good design (its something I just threw together back in the day)? I wonder if "explicit relocs everywhere" might not be better? Otherwise won't all the tools that do text to binary conversion will need some kind of extra flag to see if implicit relocs are enabled or not? Or can we magically enable implicit relocs whenever we see any kind linking annotation in the wat file? Could there exist a wat file with zero explict annotations? How would the tooling know to create a linking section or not in that case? |
Well, it does make the code prettier, and it does allow to link against all existing WAT modules with no source changes, I see that as a benefit.
Well, if we really do want that, I suppose we could dispatch based on whether the features section is present. But that wouldn't be reliable in any case, since tooling that isn't relocation-aware would skip unknown annotations and silently yield a simple module instead of an object file. |
I may not fully be following the context here, but my hope is that |
So, @sbc100 is basically asking if it would be possible at all to assemble a file form WAT without emitting relocation metadata. For disassembly it'd be easy, since it's always obvious if a file is an object file based on the presence of the relevant custom sections. But for assembly with current syntax, any valid text module can also be an object file now. |
I suppose we could make a rule such as "If at least one explicit linking annotation exists in the wat file then the whole file is assumed to be relocatable, and implicit relocations will be injected/generated in all possible locations" Should we go with "at least on explicit annotation" or should we maybe have some kind of top level annotation that expresses "this is a relocatable object file"? |
|
Something like |
|
Looking at prior art, if I am to run |
I think they would appear in the binary if and only if they appear in the source (wat). The idea of creating implicit annotations (i.e. annotation that don't exist in the wat file at all) is the tricky thing here. I'm not sure its good idea to that route. |
Well, I can make a description for target-features, require that to be present for the binary to be relocatable |
|
Personally I would advocate for |
I would assume, then, that a reasonable implementation of that would always generate the sections internally, but then strip them during output, unless either a flag or either |
|
My assumption is that the presence of |
|
By "default" do you mean "if no linking annotations are present in the file" (i.e. not creating an object file) or do you mean just silently not generating a relocation there? |
|
I guess my actual question is how would you diagnose the following: (module
(func (param) (result))
(func (@sym) (param) (result)))It is either
Option 1 would require either restarting the parse, or remembering all of the declaration locations, and then issuing an error for every one that had no annotations. |
How about we require |
|
Could handling func-with- |
Sure, but how would you implement those diagnostics? The parser state is gone by the time you do object file validation, as well as source locations for the declarations. |
That would work, too. Actually, then, I think it would be better to have an annotation describing all WAT features that are enabled for the module. (module (@wat-features linking code-metadata dwarf))For object file linking specifically, another option is to say something like (module (@target-features +overlong-leb -multimemory))to trigger object file generation, since those make sense only for linking, AFAIK |
|
@sbc100 @alexcrichton Hi, do you have any more comments? if not, I’d like to merge this soon, since my GCC codegen backend can already compile programs, but it needs spec to be reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still need to resolve the issue of how wat reader knows if it should produce a relocatable output at all?
My current preference is for a global annotation on the top level module, but I don't see that reflected in the current PR.
I'm also still not sure about the implicit relocation for things like call $foo .. does that currently generate an implicit relocation? I still think it might be good to remove all implicit relocation for v1 of the spec, although the addition of the the level annotation on the module object makes me feel a little better about it.
I'm curious if other folks have opinions about implicit relocations. @dschuff maybe? @sunfishcode @tlively ? I guess it kind of matches assembly formats where call foo is an implicit relocation.
| | Field | Type | Description | | ||
| | ------------ | -------------- | ------------------------------------------- | | ||
| | index | `varuint32` | the index of the Wasm object corresponding to the symbol, which references an import if and only if the `WASM_SYM_UNDEFINED` flag is set | | ||
| | index | `varuint32` | the index of the WebAssembly entity corresponding to the symbol, which references an import if and only if the `WASM_SYM_UNDEFINED` flag is set | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we Wasm consistently? So Wasm entity instead of `WebAssembly entity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can do that
|
|
||
| ## Relocations | ||
|
|
||
| Relocations are represented as WebAssembly annotations of the form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here? Should we just use Wasm?
| |--------------|---------------------------------------|-------------------| | ||
| | nothing | nothing | Normal relocation | | ||
| | `pic` | `R_WASM_*_LOCREL_*`, `R_WASM_*_REL_*` | Address relative to `env.__memory_base` or `env.__table_base`, used for dynamic linking | | ||
| | `tls` | `R_WASM_*_TLS*` | Address relative to `env.__tls_base`, used for thread-local storage | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to reflect the entire list of relocation types like they are listed in the binary format and/or in llvm: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/WasmRelocs.def
i.e. why create this new concept of a base type + a modifier that doesn't exist elsewhere yet? Why not just use type=R_WASM_EVENT_INDEX_XX in the text format? This would also make the format redundant since its also part of the name of the relocation type.
Maybe this new method/format/modifier concept could be added more globally later once the initial version of the text format is added? But for v1 it seems like it would make sense to simply mirror the existing binary format enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was covered extensively in #258 (comment), and @alexcrichton expressed support for it here, but in short, that way there wouldn't be an option to elide parts of the relocation annotation (i.e. defaulting and predefinig wouldn't work), so all relocations would be incredibly verbose (for example, call $foo would become call $foo (@reloc type=R_WASM_FUNC_INDEX_LEB) for no reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't see how specifying the full relocation type (e.g using R_WASM_FUNC_INDEX_LEB when a reloc is present) would prevent the whole relocation from being implicit / elided.
This seem like two orthogonal decisions, but I get that I must be missing something:
- Do we implicitly generate
relocentries for things likecallinstruction? - When
relocis specified explicitly do we use the existing enum, of something new/different
I'm also not sure that reducing verbosity needs to be the highest priority since the plan is for this format to be mostly machine read and machine written, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't see how specifying the full relocation type (e.g using
R_WASM_FUNC_INDEX_LEBwhen a reloc is present) would prevent the whole relocation from being implicit / elided.This seem like two orthogonal decisions, but I get that I must be missing something:
- Do we implicitly generate
relocentries for things likecallinstruction?- When
relocis specified explicitly do we use the existing enum, of something new/different
Apart form fully elidable relocations, other types of relocations exist, like in memory (i32.load (@reloc $mem_sym)) and in constants (i32.const (@reloc data $mem_sym)) where a relocation is not entirely elided but is greatly abbreviated form the complete relocation type. Apart form that, specifying the complete relocation type would expose relocation type names (which are for now an implementation detail of LLVM) to the wider text format.
I'm also not sure that reducing verbosity needs to be the highest priority since the plan is for this format to be mostly machine read and machine written, right?
Well, it needs to be human-readable, too, since it's a text format and humans are expected to read that too, like they usually read assembly, and likewise human-writable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart form that, specifying the complete relocation type would expose relocation type names (which are for now an implementation detail of LLVM) to the wider text format.
The relocation type names are not indented to be LLVM specific. The list of 20 relocation types, along with their ffull names, are listed above in this very document.
This is designed to mirror the ELF relocation types that are defined in the ELF header and not specific to either LLVM or GCC but are using in both place.
I think it might be a good idea to reflect this precisely in text for, so we can avoid having two different ways to specify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart form that, specifying the complete relocation type would expose relocation type names (which are for now an implementation detail of LLVM) to the wider text format.
The relocation type names are not indented to be LLVM specific. The list of 20 relocation types, along with their ffull names, are listed above in this very document.
This is designed to mirror the ELF relocation types that are defined in the ELF header and not specific to either LLVM or GCC but are using in both place.
Sure, but those names currently still don't matter much and can be changed, if we require those names there they suddenly start to matter and can no longer be changed.
I think it might be a good idea to reflect this precisely in text for, so we can avoid having two different ways to specify things.
These aren't really two ways yet, since the names in relocation types don't really matter currently, so the only "stable" way for naming relocations would be in the text format. However, stabilizing those names as fused now would be harmful, since then for v2 when I am to reintroduce the composite names, this "two ways to specify the same thing" argument would become very real, dooming the format to verbose relocations forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think
(@reloc ...)or(@sym ...)should be elidable nor should there be a top-level annotation if each relocation/symbol is individually annotated. I think it'd be ok for(@reloc $foo)to automatically infer the kind of relocation depending on context, but I'd personally prefer that if something else were inserted that it's the full LLVM-derived name to avoid ambiguity. Like @sbc100 I'm not personally too worried about verbosity here. The assembly forprintln!("hi!")is way more verbose than the line of code, and IMO that's just kind of how assembly goes.
I'm OK with eliding relocations, if and only if we have a top level annotation.
I'd also be OK without a top level annotation if we can make all relocs explicit.
@alexcrichton, can I ask, what is your objection to the top level annotation? Don't you think it would be nice to be able to, at a glace, distinguish relocation wat files without having to visually scan the whole wat file a @sym or @Reloc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's more of a preference of per-relocation/symbol annotations over a top-level annotation. I agree that either system would work alright, and the main concern that I can think of is printing an wasm object file (binary-to-text) where it feels more natural to print @reloc or @sym-per-entry as opposed to verifying that everything has a relocation or symbol and then not printing anything. For a pure text-to-binary use case I'd agree that the top-level annotation is nicer to have.
I'd naively expect that with @sym and @reloc would be frequent enough in a file that it wouldn't take much of a visual scan myself, but I'm not personally too concerned about that myself.
I should also be clear that I'm happy to be overruled here. IMO text-format design is something that's worth bikeshedding but not endlessly, so I wouldn't want to hold up anything on my own behalf too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m still not sure there’s much value in either making all symbols/relocs explicit or having a toplevel annotation, since neither actually guarantee that an object file is produced. So the only effect of such an annotation would be to artificially restrict files that can be relocatable. Currently, any valid WASM module as well as any valid WAT module can be transformed into an object file and back by merely attaching or stripping the linking information. It's a nice property and I would like to keep it, especially since there is nothing stopping us from it.
If we really really want something (advisory) that says which features are intended to be used with which WAT files, then perhaps we should have a general annotation that applies to all features, not just linking.
That annotation, if decided upon, could in principle augment assembler flags same way as -DMACRO/#define MACRO works in C and C++ compilers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a separate concern of how explicit relocation annotations will fit in with other metadata that isn't aware of linking, like code metadata. Currently it's easy, code metadata info needs a funcidx, so it grabs the primary symbol for that funcidx, and places a relocation of that symbol, no annotations required.
If explicit relocations are mandated, then the standardized syntax for code metadata would simply no longer work, just like every annotation-based metadata feature that isn't explicitly aware of relocations.
As I explained here, I do think that making all relocations explicit would be a very bad idea, since that would just make the text format excessively cluttered for no gain at all, since those relocations are mandatory and their properties can be trivially inferred. |
This sounds reasonable to me. |
This PR proposes a vendor-neutral syntax for describing relocation information in WAT.
Unlike the syntax described in WebAssembly/wabt#2649, this proposal is intended to express everything that the binary format can, including whatever the current proposed implementation in WABT does not support. Of note here is the fact that
@symannotations can appear multiple times per declaration, the inclusion of COMDATs, segment infos, and the inclusion of labels and addends.