Add SparseObservable.to_matrix, migrate from SparsePauliOp#15022
Add SparseObservable.to_matrix, migrate from SparsePauliOp#15022aaryav-3 wants to merge 62 commits intoQiskit:mainfrom
Conversation
|
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the following people are relevant to this code:
|
ShellyGarion
left a comment
There was a problem hiding this comment.
@aaryav-3 - thanks for your contribution to qiskit.
Any idea why the CI checks are not running? maybe there is some conflict with the main branch?
In addition, is this code migrated from other parts of qiskit? I didn't see any deletions.
Since the aim of this PR is to optimize the current code, could you pehaps add some performence benchmarks?
| y_count += 1; | ||
| } | ||
| BitTerm::Z => z_bits |= 1u32 << qubit, | ||
| _ => panic!("Expected only Pauli terms in precomputation"), |
There was a problem hiding this comment.
why to panic here and not raise an error?
There was a problem hiding this comment.
@ShellyGarion thank you so much for your comments. I have removed the panic and propagated an error in the latest push. As for code deletion and benchmarking, I had mocked up some local testing to prove it while I was writing code. How would you suggest I do the same in our codebase; Should I write a new test case for it for runtime check? or is there somewhere I could report my results? Additionally, in the former case, if I removed deprecated code, I would like to get an idea of how I would still retain a comparative performance benchmark. It'll be great help if you could let me know the process for the same.
There was a problem hiding this comment.
Thnaks @aaryav-3.
You can report your performence benchmark results as a comment in this PR.
There was a problem hiding this comment.
seems that CI checks Miri & rust tests are failing - could you please handle it?
There was a problem hiding this comment.
Hi Shelly, thank you for running the tests and bringing it to my notice. I am now aware of the failing rust tests, and it's been fixed in my recent push. As for the Miri tests, I'm yet to get a better idea for that. Besides this,
in benchmarking, while dense matrix conversions show comparable runtimes, for operators >=20 qubits the .to_matrix(sparse=True) method in my feature branch is ~2× slower than main due to some operational overhead, making the advantage from BitTerm mapping seem invisible. The slowdown stems from my naive port of the impl_to_matrix_sparse! macro from sparse_pauli_op.rs.
In SparseObservable, each BitTerm can expand into a superposition of multiple Paulis, so the deterministic one-to-one row–column mapping used in SparsePauliOp no longer holds. Further, functions like sparse_observable_to_matrix_serial_64 repeatedly recompute and sort row indices (see order.sort_unstable_by, this previously showed almost linear time behavior in the case of SparsePauliOp.to_matrix(sparse=True) due to deterministic XOR permutations) for every row, proving another bottleneck. I’m optimizing this by introducing Gray-code indexing, caching precomputed parity and index flips for related terms, and replacing per-row comparison sorts with in-place radix sorting of CSR rows (to revert back to linear time). These changes should restore near-main-branch performance, I'll push the changes soon, and I'd love to know what you think of them. Thank you very much for your patience!
There was a problem hiding this comment.
it seems that rust tests in Miri are still failing, could you look into it?
There was a problem hiding this comment.
yup, working on it
Pull Request Test Coverage Report for Build 23049574652Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
…_observable_tomatrix
ShellyGarion
left a comment
There was a problem hiding this comment.
LGTM. @Cryoris - do you have any further comments?
…bservable_tomatrix
…v-3/qiskit into sparse_observable_tomatrix
ShellyGarion
left a comment
There was a problem hiding this comment.
Thnaks for this nice addition. Perhaps it's also worth to add a test that shows that it's done efficiently, based for example on this comment
|
|
||
| // Pauli Y | ||
| static L_Y_OUT0: [LocalEntry; 1] = [LocalEntry(1, Complex64::new(0.0, -1.0))]; // -i | ||
| static L_Y_OUT1: [LocalEntry; 1] = [LocalEntry(0, Complex64::new(0.0, 1.0))]; // +i |
There was a problem hiding this comment.
The use of static is very good, but there are some duplications here:
L_X_OUT0 is the same as L_ONE_OUT1,
L_X_OUT1 is the same as L_Z_OUT0 and L_ZERO_OUT0,
etc.
Is there a simpler way to define them only once?
See e.g.
https://github.com/Qiskit/qiskit/blob/main/crates/circuit/src/util.rs
https://github.com/Qiskit/qiskit/blob/main/crates/circuit/src/gate_matrix.rs
|
Moving to 2.4 since we need more time to review the new projector handling |
| floating-point associativity effects.""" | ||
| # Using powers-of-two coefficients to make floating-point arithmetic associative so we can | ||
| # do bit-for-bit assertions. Choose labels that have at least few overlapping locations. | ||
| labels = ["XZIXYX", "YIIYXY", "ZZZIIZ", "IIIIII"] |
There was a problem hiding this comment.
In test_to_matrix_parallel_vs_serial, the test checks that the serial and parallel implementations of to_matrix produce exactly the same matrix (no element differs at all).
Is this exact match something you intend to guarantee for all observables, or is it mainly a sanity check that both paths are doing the same kind of work?
There was a problem hiding this comment.
yup, this is an exact match that I intend to guarantee for all observables. The resulting matrix with or without parallelization should be the same. Thank you for your review!
alexanderivrii
left a comment
There was a problem hiding this comment.
Thanks, @aaryav-3, for working on this. I realize that this was a fairly complex piece of development, and I appreciate the effort that you have put into this.
It is also a bit challenging to review this. If I understand correctly, there is no paper that describes this extension for matrix construction from SparsePauliOp to SparseObservable. Is there somewhere a description/documentation of this extension?
I would definitely appreciate some general description of the idea, and better documentation of the new structures and functions that you introduced. For example, what are L_PLUS_OUT0 (and others), what are different fields in TermDecomp, etc.?
If I recall correctly, the code for SparsePauliOp modified the matrix in-place, is this also done here?
| let local = local_for_bitterm(bt).unwrap(); // safe: all BitTerms have Local2x2 | ||
| qubits.push(q); | ||
| locals.push(local); | ||
| mask |= 1u64 << q; |
There was a problem hiding this comment.
Doesn't this panic when q > 64?
| // Compact compiled form of a SparseObservable term | ||
| #[derive(Clone, Debug)] | ||
| struct TermDecomp { | ||
| qubits: Vec<u32>, | ||
| mask: u64, | ||
| dest_offsets: Vec<u64>, | ||
| locals: Vec<&'static Local2x2>, | ||
| coeff: Complex64, | ||
| } |
There was a problem hiding this comment.
I can't immediately figure out what's going on: is there a reference to a paper/text that describes this extension to sparse observables?
Could you please add docstring comments explaining the fields in the structure (e.g. what are mask, dest_offset, etc.)?
There was a problem hiding this comment.
added the comments in my latest commit. Thank you!
| /// Done for runtime optimization before matrix construction. | ||
| /// This deduplicates equivalent Pauli terms within the SparseObservable, | ||
| /// ensuring each unique (X, Z) mask pair appears only once. | ||
| fn combine_duplicates(&mut self) { |
There was a problem hiding this comment.
Can we use SparseObservable.canonicalize instead?
There was a problem hiding this comment.
you are right, thank you so much for pointing out to this API, it indeed fits in place with the current implementation, and a new function is not needed. I have addressed this in my latest commit.
| /// | ||
| /// :meth:to_matrix Convert the observable to a dense NumPy array or a sparse | ||
| /// CSR matrix. |
There was a problem hiding this comment.
should this be to_matrix_sparse and to_matrix_dense rather than to_matrix?
| // --------------------------------------------------------------------------- | ||
| // PauliSparseObservable; CSR matrix conversion (Pure Pauli case) | ||
| // --------------------------------------------------------------------------- |
There was a problem hiding this comment.
Why do we need to consider the pure Pauli case separately - is this a performance optimization or something else?
There was a problem hiding this comment.
yup, it is a performance optimization retained from legacy implementation
| // --------------------------------------------------------------------------- | ||
| // Local 2x2 operator tables for BitTerms (Paulis + projectors) | ||
| // --------------------------------------------------------------------------- |
There was a problem hiding this comment.
Can you describe the intuition behind these local tables?
There was a problem hiding this comment.
I have added comments documenting their role in the latest commit. I do believe, with a major refactor like this, we need to document it exhaustively. Thank you for bringing this up!
Summary
This PR fixes GitHub issue #13389 by introducing the to_matrix method in SparseObservable, by omitting the usage of PauliList and using Extended Alphabet based Pauli projectors instead for time and space efficiency
Checklist
Summary of Changes
Migrate and Generalize
to_matrixfromSparsePauliOptoSparseObservableto_matrixtoSparseObservable: The Rust implementation for matrix conversion now resides inSparseObservable, making it a core feature of that class. Functionality is now attained via an intermediate representation called :class:MatrixComputationData, logically equivalent to :class:ZXPaulisbut fine tuned for BitTerms.as_paulisfor non-paulis currently, however pure paulis are handled via direct bit manipulationsBenchmark Performance Report:
Sparse Matrix - Parallel (
sparse=True, force_serial=False):Sparse Matrix - Serial (
sparse=True, force_serial=True):Dense Matrix Performance:
Feature branch shows not only comparable results, but a large speedup in all cases. It is especially observed in sparse matrix conversion in the serial case, which is the expected performance boost from BitTerm representation. Further, the parallel case also reaps benefit from the speedup, however, the added conversion and rayon chunking overhead adds a greater difference to the smaller runtimes.