Skip to content

[WIP] Compress sparse observable#14073

Open
alexanderivrii wants to merge 13 commits intoQiskit:mainfrom
alexanderivrii:compress-sparse-op
Open

[WIP] Compress sparse observable#14073
alexanderivrii wants to merge 13 commits intoQiskit:mainfrom
alexanderivrii:compress-sparse-op

Conversation

@alexanderivrii
Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii commented Mar 23, 2025

Summary

This is a preliminary implementation of the following idea (that came up in a discussion with @Cryoris, @jakelishman and others): in certain cases we can "compress" (aka reduce the number of terms) a SparseObservable by combining terms. For example, the term 1.5 * "X+IZ" can be combined with the term -1.5 * "X+ZZ" to produce 3.0 * "X+1Z". Or for another example, the term 1.5 * "X+IZ" can be combined with the term 1.5 * "X-IZ" to produce 1.5 * "XXIZ". While finding the optimum way to combine terms is intractable, a simple greedy strategy is definitely possible.

This can be used, for example, when synthesizing a Pauli evolution circuit for a SparseObservable: in some cases the number of two-qubit gates for the compressed observable may be smaller than for the original one. Surprisingly, this shows value on HamLib benchmark set.

Details and comments

The PR should is based on top of #14067 (which improves the default synthesis of SparseObservables coming from HamLib benchmarks). Update: #14067 is now merged.

Currently this contains the following reductions:

'I' + 'X' = 2 * '+'
'I' - 'X' = 2 * '-'
'I' + 'Z' = 2 * '0'
'I' - 'Z' = 2 * '1'
'I' + 'Y' = 2 * 'r'
'I' - 'Y' = 2 * 'l'

'+' + '-' = 1 * 'I'
'+' - '-' = 1 * 'X'
'0' + '1' = 1 * 'I'
'0' - '1' = 1 * 'Z'
'r' + 'l' = 1 * 'I'
'r' - 'l' = 1 * 'Y'

Any other suggestions are highly welcome!

Experiments on HamLib benchmarks

The following excel sheet contains the experiments for the 100 HamLib benchmarks that we use in BenchPress, transpiled to ["cx", "u"] and considering optimization_level=0 and optimization_level=2:

hamlib_compressed.xlsx

For 39 out of the 100 benchmarks, at least one reduction is possible. If we run transpile with optimization_level=0, then compressing the SparseObservable is always beneficial: in 29 out of 39 cases the number of CX-gates is reduced (and in no case it is increased).

However, compressing terms in SparseObservable may negatively affect the cancellations possible with optimization_level=2, thus unfortunately compressing the SparseObservable is not always beneficial. Yet in 16 out of 39 cases the "compression" does help with optimization_level=2 as well, with at least 3 cases where the reduction is substantial:

  • for test 8, the number of CX-gates is reduced from 58 to 52
  • for test 17, the number of CX-gates is reduced from 6388 to 5411
  • for test 27, the number of CX-gates is reduced from 2470 to 2100.

@alexanderivrii alexanderivrii requested a review from a team as a code owner March 23, 2025 15:21
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 23, 2025

Pull Request Test Coverage Report for Build 14111407194

Details

  • 137 of 166 (82.53%) changed or added relevant lines in 1 file are covered.
  • 36 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.04%) to 88.05%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/accelerate/src/sparse_observable/mod.rs 137 166 82.53%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 6 91.48%
crates/qasm2/src/parse.rs 30 95.76%
Totals Coverage Status
Change from base Build 14109208575: -0.04%
Covered Lines: 72867
Relevant Lines: 82756

💛 - Coveralls

@alexanderivrii alexanderivrii added this to the 2.1.0 milestone Mar 24, 2025
Copy link
Copy Markdown
Collaborator

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach LGTM, I left some comments below. It would be nice to see some tests on e.g. things like II + IZ + ZI + ZZ --> |0><0| |0><0| 🙂 (also then reno and tests)

(term1, term2)
};

if t1.num_qubits != t2.num_qubits {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but this and the same_sign below could be moved before the sorting by number of bit terms, since this might be fast exit paths where we don't have to order the terms

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten the code, making sure to take fast exit paths into account.

}

// check that the coefficients are equal or negative of each other (within the specified tolerance)
let same_sign = if (t1.coeff - t2.coeff).norm_sqr() <= tol * tol {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not just to use abs < tol?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it was already used in pub fn canonicalize and I wanted to keep this consistent, though I agree this should not make a difference.

/// Keeps the original ordering of terms as much as possible.
pub fn compress(&self, tol: f64) -> SparseObservable {
let mut terms: Vec<SparseTerm> = self.iter().map(|t| t.to_term()).collect();
let dummy_term =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this dummy term or could we just create an empty vector and push new elements as we iterate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the main loop based on this and other suggestions, see 9520ccb. Now instead of keeping Vec<BitTerm> and modifying it in-place, we create a new observable for each iteration.

Comment on lines +879 to +882
out.coeffs.push(term.coeff);
out.bit_terms.extend_from_slice(&term.bit_terms);
out.indices.extend_from_slice(&term.indices);
out.boundaries.push(out.indices.len());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use

Suggested change
out.coeffs.push(term.coeff);
out.bit_terms.extend_from_slice(&term.bit_terms);
out.indices.extend_from_slice(&term.indices);
out.boundaries.push(out.indices.len());
out.add_term(&term.view());

I think, instead of doing this manually 🙂

Copy link
Copy Markdown
Member Author

@alexanderivrii alexanderivrii Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also this is how it was already done in the existing code pub fn canonicalize (but this is way nicer indeed).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

///
/// Keeps the original ordering of terms as much as possible.
pub fn compress(&self, tol: f64) -> SparseObservable {
let mut terms: Vec<SparseTerm> = self.iter().map(|t| t.to_term()).collect();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there were some lifetime issues when using SparseTermView instead of SparseTerm? If not it might be nicer to pass &SparseTermViews to avoid cloning the underlying data

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, done in 9520ccb.

Comment on lines +773 to +777
let mut new_bits = t1.bit_terms.to_vec();
new_bits[mismatch_pos] = BitTerm::X;
let new_indices = t1.indices.to_vec();
let new_coeff = t1.coeff * Complex64::new(0.5, 0.0);
(new_bits, new_indices, new_coeff)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to put this logic into a closure that can be called to avoid the duplication, something like

let proj_to_pauli = |(pauli: BitTerm, negative_sign: bool, pos: usize, term: &SparseTerm)| -> (...) {
    let mut new_bits = term.bit_terms.to_vec();
    new_bits[pos] = pauli;
    let new_indices = term.indices.to_vec();
    let new_coeff = term.coeff * Complex64::new(if negative_sign { 0.5 } else { -0.5 }, 0.0);
    (new_bits, new_indices, new_coeff)
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completely rewritten this bit of code.

Ok(simplified.into())
}

/// Greedily combine the terms in the observable.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should say something about runtime here. Did you find this to be slow in practice?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about runtime: the worst-time complexity is $O(terms^3 * qubits)$, however in practice it's significantly lower: on HamLib benchmarks we never have more than 2-3 total iterations. In addition, due to fast exit conditions (for trying to combine two sparse terms) + sparsity of the indices, we usually do a small number of computations to decide that two sparse terms cannot be merged or merge them. Thus I would speculate that in practice we see $O(terms^2)$ runtime. The algorithm is indeed fast on HamLib benchmarks. If the runtime does become a bottleneck, we could limit the total number of iterations by a small constant.

@eliarbel eliarbel modified the milestones: 2.1.0, 2.2.0 May 29, 2025
@raynelfss raynelfss added the Changelog: Added Add an "Added" entry in the GitHub Release changelog. label Aug 13, 2025
@mtreinish mtreinish modified the milestones: 2.2.0, 2.3.0 Aug 18, 2025
@github-project-automation github-project-automation bot moved this to Ready in Qiskit 2.3 Oct 7, 2025
@alexanderivrii alexanderivrii modified the milestones: 2.3.0, 2.4.0 Nov 19, 2025
@jakelishman jakelishman removed this from Qiskit 2.3 Nov 24, 2025
@alexanderivrii alexanderivrii modified the milestones: 2.4.0, 2.5.0 Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Added Add an "Added" entry in the GitHub Release changelog.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants