Conversation
|
One or more of the following people are relevant to this code:
|
mrossinek
left a comment
There was a problem hiding this comment.
(looking just at 900d450)
This looks very nice and I especially like the comprehensive explanations, thanks for that! 👍
I have not yet reviewed the details of the VTable implementation itself, but skimming the structure of the code this makes sense to me 👍
Seeing the hard-coded list of function names though, I agree that a linter to ensure all C functions are taken care of is very imported with a growing C API going forward. Are you aware of tooling to aid with that or would you need to build something entirely from scratch for this?
From a more technical side: if I understand correctly, there is (obviously) a limit to how many additional function pointers can fit within the right segment of a VTable leaf going forward. There is a tradeoff to that and I assume it is hard (probably impossible) to predict what the right balance is here. So my question is: do you envision enforcing ABI backwards compatibility across major versions of Qiskit? Or would you allow a breaking VTable order when (e.g.) going from 2.X to 3.0? If not, I suppose that there is no way around an "unordered" table some time in the future, right?
I've largely got it written locally, and it's custom. It's not that hard at all, given the code generation we're already doing.
So each top-level You can put new functions into any space without breaking ABI, even if that means adjusting the sizes of the reserved sets. More importantly, though, you can always put more functions at the end of one of the vtables, and that's never a problem (that's generally why I used three separate API vtables not just one - to give a little more room for expansion).
Once we stablise the C API (which internally we're now talking about doing before 3.0, but exact timelines tbc), we will be committing to forwards-compatibility of pre-built extension modules (i.e. backwards compatibility of the API) within major versions. Major versions are the only time we could be breaking the slots layout (by removing one or changing its type) or significantly changing the behaviour of a C API method. That said, we aren't stabilising the C API yet, and in fact this release contains two major function breakages (off the top of my head):
|
|
Oh also:
I also have (mostly) written tooling to verify the slot layout remains feature-stable (with respect to function names) between minor versions. I'll be pushing both up into the repo and adding them to the branch-protection rules too, once I've finished them off. |
I demoed this to the internal team yesterday too, and the pattern I explained was that, let's say we have: mod circuit {
static FUNCTIONS: ExportedFunctions = ExportedFunctions::leaves(10, || {
vec![/* ... */]
});
// ...
}
mod dag { /* ... */ }
mod param { /* ... */ }
pub static FUNCTIONS_IR = ExportedFunctions::empty()
.with_child(10, &circuit::FUNCTIONS)
.with_child(20, &dag::FUNCTIONS)
.with_child(30, ¶m::FUNCTIONS);... and then we fill up mod circuit {
static FUNCTIONS: ExportedFunctions = /* ... as it was ... */;
// Add a new set of leaves:
static FUNCTIONS_2: ExportedFunctions = ExportedFunctions::leaves(50, || vec![]);
}
mod dag { /* ... */ }
mod param { /* ... */ }
pub static FUNCTIONS_IR = ExportedFunctions::empty()
.with_child(10, &circuit::FUNCTIONS)
.with_child(20, &dag::FUNCTIONS)
.with_child(30, ¶m::FUNCTIONS)
// ... and just tack it on the end.
.with_child(40, &circuit::FUNCTIONS_2); |
Very nice! 👍
Ah okay that makes sense. Thanks for the additional details 👍
Alright that makes sense 👍 I'll also take a look at the recording of yesterday's internaly meeting where you presented this in more detail. |
Pull Request Test Coverage Report for Build 22786506574Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
While the `_accelerate.abi3.so` object that ships with Qiskit already exposes all the C API symbols with public `qk_*` names, this can't safely be used by distributable compiled Python extension modules, because they cannot rely even on deferred dynamic linking to the object. Instead, we define what is effectively a set of vtables, where the actual addresses of the functions are written (at runtime) at known offsets to a base pointer. These can then be accessed without the actual involvement of a linker by knowing the base pointer of the vtable, the offset of the desired function and the expected signature. In order for user builds to be forwards compatible (or in other words, for later Qiskit builds to be _backwards_ compatible) from an ABI perspective, the offsets into the vtables must be constant between Qiskit versions. This requires them to be defined statically, without being inferred from other functions; if we try to infer based on the set of functions, there is no way to keep them the same as functions are added without defining an order. The hierarchical `VTable` machinery introduced in this commit is a trade-off between two extreme approaches: 1. use a per-function annotation to set the slot and the vtable 2. use a single global list completely defining the vtable Option 1 has the negative that it's incredibly hard to tell from local information only what the available slots are, and which slot should be next assigned. Option 2 is undesirable because it completely centralises all definitions, which will likely make it very hard to add new C API functions without constantly generating merge conflicts (which is especially important to avoid breaking the merge queue), and likely leads to the functions by sequential offset being in random order (which isn't a technical problem, but is aesthetically unsatisfying!). The hierarchical approach allows C-API additions that touch completely different modules to be independent, while still permitting some locality in slot assignments for related functions, and providing an overview of where the slots are assigned. Each `leaves` node in the hierarchy over-allocates slots for itself to allow some addition of new functions in the future. There is a trade-off between having many `PyCapsule` function pointers and spreading the data across many completely separate pointers (which leaves most room for expansion), and having very few vtables (where we have to leave intermediate gaps for expansion within the hierarchy). The names and groupings are not especially important; they're mostly aesthetic, and subsequent patches will introduce header files and other automated tools that mostly mean that users will not have to worry about the internals of the vtables themselves. This is not included in `cext` itself (despite my earlier attempts to do just that) because doing so would require _all_ use of the vtable specification to involve a complete compilation of Qiskit, likely also including linking in `libpython`. In language-binding generation, we do not need to do that; we only need the string names and the separate assistance from `cbindgen` to parse out the function signatures.
|
This is rebased and should be completely up to date with all public C API functions - I ran #15778 against the commit here and fixed it up. |
mrossinek
left a comment
There was a problem hiding this comment.
Some minor details and clarification questions, other than that this looks good to me
| pub static FUNCTIONS_CIRCUIT: ExportedFunctions = ExportedFunctions::empty() | ||
| .add_child(0, &circuit::FUNCTIONS) | ||
| .add_child(100, &dag::FUNCTIONS) | ||
| .add_child(200, ¶m::FUNCTIONS) | ||
| .add_child(250, &circuit_library::FUNCTIONS); | ||
| pub static FUNCTIONS_QI: ExportedFunctions = | ||
| ExportedFunctions::empty().add_child(0, &sparse_observable::FUNCTIONS); | ||
| pub use transpiler::FUNCTIONS as FUNCTIONS_TRANSPILE; |
There was a problem hiding this comment.
Not that it really matters, but why are you using two different patterns of construction here?
circuitandQIare defining their outer-mostExportedFunctionsstruct right here- while
transpileris deferring it to inside themod transpilercode block below
I don't see a functional need for this, and then I think it might be less confusing to follow one pattern (probably the one done by transpiler) for all top-level ExportedFunctions here.
There was a problem hiding this comment.
I think the logic in my head was that FUNCTIONS_TRANSPILE actually corresponds to one single "module" in the cext structure, whereas CIRCUIT and QI are just catch-all names and don't have corresponding modules (they come from all over the place).
So e.g. if I was doing it for CIRCUIT: which mod should I put it in? It can't be circuit because that doesn't contain dag, param or circuit_library, and I was trying to have the mod here match the structure of cext itself (within reason).
I'm not wedded to the way I did things, so if you've got a better suggestion let me know.
There was a problem hiding this comment.
Yeh fair enough 🤔 After all, the cext is flat anyways and has no notion of modules (other than what we might impose into it, given common function name prefixes). I see where you are coming from. I will gladly leave the organization of this to a decision of the actual development team 😛
raynelfss
left a comment
There was a problem hiding this comment.
I took a brief look and couldn't find anything particularly concerning. Most of my comments are docstring related and some questions about the general implementaiton. I also have a question that I didn't want to leave as a comment as I couldn't figure out where it would fit best:
- Why is
addra feature since you have established that theaddrattribute of anExportedFunctionis an integer castable pointer? And the feature is turned on by default based on theCargo.tomlupdates to thepyextcrate, which is the only place in which it is used, so why is it toggleable?
Thank you for working on this.
| if offset < self.len { | ||
| panic!("offset is less than previously reserved space; don't fill in holes"); | ||
| } |
There was a problem hiding this comment.
Just to check that I'm understanding this correctly but the offset needs to be bigger than the current length because we don't want to intrude on another child's offset space?
There was a problem hiding this comment.
This condition is actually slightly stricter than it 100% needs to be, but it's just much easier to enforce this using const fn definitions without actually instantiating the || vec![...] closures (which we can't do at compile time because of a) the function pointers and b) the heap-allocating Vec) or having to do constant looping checks to see the reserved sections. Instead, we just enforce that you always build up any one individual ExportedFunctions object linearly, then the const fn checking just reduces to "does this offset require us to go backwards?", and because you can use the general hierarchy to go out-of-order, it ends up not being hard to do for a human either.
So the enforcement is done in two ways:
- at compile time, we ensure with this slightly over-strict test that you always build forwards, marking out reserved space
- at instantiation time, we ensure that no
leavesoverrun their reserved space.
| #[cfg(feature = "addr")] | ||
| use qiskit_cext::dag::*; | ||
|
|
||
| pub static FUNCTIONS: ExportedFunctions = ExportedFunctions::leaves(100, || { |
There was a problem hiding this comment.
The value used here makes me thing should we have const values for each module's reserved leaves (or offsets?)
It doesn't seem like it would be something we'd want to change but if we did it might be better to have the associated value be a const that we could change without worrying about updating the whole structure.
That said, I do recognize it might not be something worth doing right now since we are unlikely to change these values often (or at least I'd like to think that's the case.)
There was a problem hiding this comment.
You can actually increase or decrease the reserved space without causing API stability trouble, because the actual slot assignment stems from the offsets, not the reservations. You obviously can't reduce a reservation to be smaller than the number of functions you store (error at instantiation time) nor can you increase it so that it encroaches on a later offset, but within that gap you can change it around.
The whole reservation integer thing isn't actually necessary for correctness at all, but I put it in because it means that the approximate length of a reservation can be specified locally to the leaves themselves, rather than a million miles away when the entire hierarchy is added with an offset. It's way harder to get the offsets right when you don't have this sort of cross-checking like this.
For the same locality reasons, I don't want to make these const values that are moved somewhere else - the whole point of them is to be local so you can see, when adding more leaves, how much space you've got left, and whether it's worth doing something different with the hierarchy. They don't need to agree precisely with the offsets (and in fact they don't in general, especially in TRANSPILER where things are much more fragmented).
Co-authored-by: Raynel Sanchez <87539502+raynelfss@users.noreply.github.com> Co-authored-by: Max Rossmannek <rmax@ethz.ch>
Co-authored-by: Raynel Sanchez <87539502+raynelfss@users.noreply.github.com>
This becomes particularly important in the next PR. When we actually build the The When we're not using the |
mrossinek
left a comment
There was a problem hiding this comment.
This is good to go from my perspective, given that the only changes since my last review were related to my PR and the additional comments by @raynelfss 👍
mtreinish
left a comment
There was a problem hiding this comment.
Overall this looks great to me. I think the structure you've outlined for building up the vtable makes a ton of sense, with a lot of thought to ABI forward compatibility which is often the trap for these kind of data structures. There should be enough flexibility in the structures to let us grow the vtable as we build out the C API but do it in a controlled manner that enables compatibility when users upgrade. The only thing I don't have a clear picture in my head about is how the header generation works and is integrated into the python build system. But that's probably the next PR in your series.
Personally, I probably would have just gone for a flat structure for simplicity as a starting point despite the limitations with that approach you outlined in the PR summary and probably tried to work around those limitations with out of banding tooling and lint rules. Then we'd be cursing that decision in subsequent months/years. So I do appreciate having a more sane structure here around building out the vtables.
I did have one inline comment about development documentation. While there is a ton of good inline code documentation and comments explaining the rationale and usage of the raw types. Explaining how to put all the pieces together is a bit sparse. Most developers will not be interacting with the internals here, but will be just using it to extend the existing vtables with new functions. Not necessarily for this PR, but I think having stronger guidelines and patterns around how to structure the vtables in lib.rs would be good. Having things more prescriptive and using simple patterns/rules will make it easier for people to work with it. There is a good amount of flexibility in how to structure things, but I think because we expect the C API to grow a lot in the near future it'd be good to have a dev guide or something that explains how to add a new C API feature and covers the whole path including how one should structure or extend a vtable.
If you want to add any docs in this PR feel free to do so and I'll reapprove quickly, otherwise you can just enqueue this to merge and we can open a tracking issue to add a dev guide or something.
| .filter_map(move |funcs| { | ||
| funcs | ||
| .as_ref() | ||
| .map(move |(inner, funcs)| funcs.exports(offset + inner)) |
There was a problem hiding this comment.
Hah, I vaguely remember talking to you about this when you first wrote this. But I really do appreciate a recursive iterator chain.
There was a problem hiding this comment.
this can be made a fair amount neater when Rust stabilises gen functions haha - it's crying out for a little yield from.
| #[cfg(feature = "addr")] | ||
| use qiskit_cext::circuit::*; | ||
|
|
||
| pub static FUNCTIONS: ExportedFunctions = ExportedFunctions::leaves(100, || { |
There was a problem hiding this comment.
Just to test the model here a bit as a hypothetical, what happens if we are going to cross 100 functions in the circuit module? Do you add a child with the extra functions or increase the array size? Both seem to be viable options here for expanding the number of functions in a table in a future release. I'm assuming the intent behind the children is to have it mirror the structure of the library and indicate the hierarchy in the code. If that's the case it might be worth calling this kind of thing more explicitly in development documentation somewhere. There are a lot of good comments inline, about the structure of the code and data structures being outlined here but the only guidance on this you have is "use your common sense" lol.
I think it's a fine reservation size for this, and we're unlikely to overshoot it in practice since there are like 183 methods on the python circuit class which include all the gate methods which we won't have here.
There was a problem hiding this comment.
Do you add a child with the extra functions or increase the array size?
Since children get flattened after the reservation of the leaves, it doesn't give you an escape valve for the size. The actual limit is set by the offsets into the containing ExportedFunctions - you can increase or decrease the reservation size provided you don't overflow the space available in the parent before the next offset or contract the reservation so the leaf/child functions don't fit any more.
The pattern I've been explaining is that we'd add a separated ExportedFunctions child to the parent, and we'd just end up in the situation where there'd be two separate blocks of circuit functions. There's a rough example in #15761 (comment), which I'll pull into proper dev documentation when I write it.
I'm assuming the intent behind the children is to have it mirror the structure of the library and indicate the hierarchy in the code.
The structure of the hierarchy in the ExportedFunctions isn't really so important because it's not user facing. It's more just so that we can arrange the definitions of the functions ourselves in modules for ease of searching.
It mostly stems because my ideal situation is that all the slots are defined locally in cext itself near to the true function definitions, which are clearly split by modules. I couldn't achieve that without causing a complete compilation against cext during the pyext build script (and needing to link libpython) at the time, hence the separate cext-vtable crate, but actually now: given how I do the addr feature in this crate, I'm actually wondering if I can do the same thing within the qiskit-cext crate itself (i.e. not depend on anything and not compile in anything unless the impl feature is set or something), and get rid of cext-vtable.
| export_fn!(pbc::qk_pauli_product_rotation_clear), | ||
| export_fn!(pbc::qk_pauli_product_measurement_clear), |
There was a problem hiding this comment.
Not related to this PR, but how/why did these end up in the circuit library? I would have expected them to live with the instruction definitions which I assumed was in the circuit module.
There was a problem hiding this comment.
I don't know - I couldn't find them at first either. We can move them into circuit if we prefer?
cc: @Cryoris
Yeah, it's all in the next PR. The summary version is:
I played with a flat structure very briefly, and then almost immediately got annoyed with it when I had to change it / rearrange it. That said, that was also before I had any of the linting tools written, which would have eased the burden, and once the main structure was actually in place we wouldn't have had to chop/change so much, so it wasn't a 100% fair test. I'm not at all convinced that the structure I've landed on here is the best possible one, but I've had to do enough chopping and changing of it now that I'm relatively comfortable that it doesn't get in the way too much, and caught several of my errors at compile time.
I generally agree with this - I feel like we can probably do it in module-level documentation of |
|
I've opened #15829 to track that I need to write dev docs on the structure of Thanks everybody for the reviews! |
I think module level documentation will work fine, although I don't know how many people actually bother with local rustdoc (I know that I rarely do). But, the other thought I had as I was writing that paragraph is maybe we should have a markdown file on contributing to the c api which will complement our current contributing guide and give more concrete guidance on the full lifecycle of adding or modifying the C API in some way. This also gives us a place to document ABI stability concerns when we get to that point. |
|
Also fine by me - I can put more of it |
While Qiskitgh-15778 is not yet merged, we don't have CI enforcement that the vtable slots are always completely up-to-date. Several PRs adding new C API functions merged between the merge-base of Qiskitgh-15761[^1] and `main` at its time of merge (and subsequently). This PR uses the machinery in Qiskitgh-15778 to bring the slots completely up-to-date. [^1]: 888c877: Add C API vtable to `pyext`
While gh-15778 is not yet merged, we don't have CI enforcement that the vtable slots are always completely up-to-date. Several PRs adding new C API functions merged between the merge-base of gh-15761[^1] and `main` at its time of merge (and subsequently). This PR uses the machinery in gh-15778 to bring the slots completely up-to-date. [^1]: 888c877: Add C API vtable to `pyext`
While the
_accelerate.abi3.soobject that ships with Qiskit already exposes all the C API symbols with publicqk_*names, this can't safely be used by distributable compiled Python extension modules, because they cannot rely even on deferred dynamic linking to the object. Instead, we define what is effectively a set of vtables, where the actual addresses of the functions are written (at runtime) at known offsets to a base pointer. These can then be accessed without the actual involvement of a linker by knowing the base pointer of the vtable, the offset of the desired function and the expected signature.In order for user builds to be forwards compatible (or in other words, for later Qiskit builds to be backwards compatible) from an ABI perspective, the offsets into the vtables must be constant between Qiskit versions. This requires them to be defined statically, without being inferred from other functions; if we try to infer based on the set of functions, there is no way to keep them the same as functions are added without defining an order. The hierarchical
VTablemachinery introduced in this commit is a trade-off between two extreme approaches:Option 1 has the negative that it's incredibly hard to tell from local information only what the available slots are, and which slot should be next assigned. Option 2 is undesirable because it completely centralises all definitions, which will likely make it very hard to add new C API functions without constantly generating merge conflicts (which is especially important to avoid breaking the merge queue), and likely leads to the functions by sequential offset being in random order (which isn't a technical problem, but is aesthetically unsatisfying!).
The hierarchical approach allows C-API additions that touch completely different modules to be independent, while still permitting some locality in slot assignments for related functions, and providing an overview of where the slots are assigned.
Each
leavesnode in the hierarchy over-allocates slots for itself to allow some addition of new functions in the future. There is a trade-off between having manyPyCapsulefunction pointers and spreading the data across many completely separate pointers (which leaves most room for expansion), and having very few vtables (where we have to leave intermediate gaps for expansion within the hierarchy). The names and groupings are not especially important; they're mostly aesthetic, and subsequent patches will introduce header files and other automated tools that mostly mean that users will not have to worry about the internals of the vtables themselves.This is not included in
cextitself (despite my earlier attempts to do just that) because doing so would require all use of the vtable specification to involve a complete compilation of Qiskit, likely also including linking inlibpython. In language-binding generation, we do not need to do that; we only need the string names and the separate assistance fromcbindgento parse out the function signatures.This commit doesn't actually provide much of a way to use the vtable yet - that comes in a follow-up.
I believe that this commit is (right now) not fully up-to-date with all the additions to the C API since I started writing it. I will bring it completely up-to-date before merge. I also intend to provide (in separate PRs) some tooling that builds on this to provide two "lints":
Depends on:
qiskit.capimodule #15711TargetandDAGCircuit#15337