[WIP] Speedup `CommutativeOptimization` pass by alexanderivrii · Pull Request #15504 · Qiskit/qiskit

alexanderivrii · 2026-01-05T10:53:59Z

Summary

By experimenting with various larger circuits with 100-2000 qubits (including QFT, QAOA, Trotterized Hamiltonian evolution, etc.) in an effort to replace CommutativeCancellation by CommutativeOptimization in the transpiler pipeline (see #15464), I saw that CommutativeOptimization is about 2x faster than CommutativeCancellation, and yet is still considerably slow. There seem to be two main bottlenecks in CommutativeOptimization: (1) the commutation checker can be quite slow and (2) the pass does a humongous number of trivial commutation checks where the gates have disjoint supports. This is something that we were aware about but so far did not find a good way to address this problem.

The current PR attempts to improve the runtime by computing a u64 for each node's qubits/clbits, and in particular when the two masks are disjoint (that is, the bit-and mask1 & mask2 = 0), the nodes have necessarily disjoint supports and hence trivially commute. This catches the majority (though not all) trivial commutation checks and improves the runtimes about 2x-4x times, and actually the improvement gets larger on larger circuits).

Here is some quick profiling data on my laptop:

multiplier with 216 qubits: main = 1.76 s, this PR = 0.93s,
multiplier with 512 qubits: main = 41.08 s, this PR = 14.39s,
QFT with 216 qubits: main = 1.83 s, this PR = 0.60s,
QFT with 512 qubits: main = 62.7 s, this PR = 14.19s,

I am keeping this WIP for now to see if someone can suggest a better approach that does not require introducing masks.

qiskit-bot · 2026-01-05T10:54:03Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

coveralls · 2026-01-05T11:38:30Z

Pull Request Test Coverage Report for Build 20713139470

Details

25 of 25 (100.0%) changed or added relevant lines in 1 file are covered.
16 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.004%) to 88.32%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	93.82%
crates/transpiler/src/passes/commutative_optimization.rs	1	96.28%
crates/qasm2/src/lex.rs	5	91.77%
crates/circuit/src/parameter/symbol_expr.rs	9	72.94%

Totals
Change from base Build 20695381143:	0.004%
Covered Lines:	96747
Relevant Lines:	109542

💛 - Coveralls

raynelfss · 2026-01-05T20:22:46Z

crates/transpiler/src/passes/commutative_optimization.rs

+/// Computes a `u64` mask for a given node's qubits and clbits.
+///
+/// If the circuit has both qubits and clbits, the mask has
+/// 32 low bits for qubits and 32 high bits for clbits.
+/// When the circuit has no clbits, all of the 64 bits are used
+/// for qubits.


Hmm... could we get away with using a smaller u32 for the masks here as they represent qargs/cargs.

Also, what happens if the circuit has more than 64 bits total. I.e. if any of the indices exceeds 64? I'm assuming the value would overflow making your mask an invalid representation.

A general question out of curiosity, would it make sense to add the first instruction’s qargs/cargs to a set, then check whether any of the second instruction’s qargs/cargs are already in it? Or would that be too inefficient?

@raynelfss, we map the index of a qubit to index % 64, so there is no overflow (implemented as index & 63 since this is a bit faster than taking modulus). Such masks are quite common in SAT-solving (for fun, see e.g. section 4.2 in https://cca.informatik.uni-freiburg.de/papers/EenBiere-SAT05.pdf).

Reducing the size of the mask leads to more false negatives: let's say that the circuit has no classical bits; then CX(1, 10) is found disjoint from CX(42, 43) if mask of size 64 is used but not if mask of size 32 is used. Experimentally, 64 is better on the benchmarks I tried.

If the circuit has both qubits and clbits, I am only using 32 bits for each. Possibly I could give more bit-width to qubits (e.g. 48 bits to qubits and 16 bits to clbit), but I have not tried that.

@Shobhit21287, I actually tried similar things: (1) Storing for each node the sorted lists of its qubits, which then enables checking whether the two sorted lists are disjoint in time linear in the sum of the two lists. This did not help since in my examples most of the gates had very few qubits/clbits. (2) Storing fixedbitset (that was absolutely horrible on large benchmarks).

At some point @jakelishman mentioned that by cleverly iterating over the DAG it might be possible to avoid considering gates with disjoint supports whatsoever, but I don't see this.

We also have some code in commutation_analysis.rs which however works only for 1 and 2 qubit gates, and requires computing additional (often redundant) information.

speedup commutative optimization pass

23bae20

alexanderivrii requested a review from a team as a code owner January 5, 2026 10:54

raynelfss reviewed Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Speedup `CommutativeOptimization` pass#15504

[WIP] Speedup `CommutativeOptimization` pass#15504
alexanderivrii wants to merge 1 commit intoQiskit:mainfrom
alexanderivrii:speedup-commutative-optimization

alexanderivrii commented Jan 5, 2026

Uh oh!

qiskit-bot commented Jan 5, 2026

Uh oh!

coveralls commented Jan 5, 2026

Uh oh!

raynelfss Jan 5, 2026

Uh oh!

Shobhit21287 Jan 6, 2026

Uh oh!

alexanderivrii Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

alexanderivrii commented Jan 5, 2026

Summary

Uh oh!

qiskit-bot commented Jan 5, 2026

Uh oh!

coveralls commented Jan 5, 2026

Pull Request Test Coverage Report for Build 20713139470

Details

💛 - Coveralls

Uh oh!

raynelfss Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Shobhit21287 Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

alexanderivrii Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants