You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement parallel processing for window functions.
PostgreSQL's parallel processing cannot handle window functions. In
contrast, our distributed environment enables parallel execution of
window functions across multiple processes on multiple segments.
For example:
sum(a) over(partition by b order by c)
The window function can be processed by redistributing data
based on column b to ensure all rows with the same b value are processed
by the same worker, significantly improving efficiency.
Even without PARTITION BY clauses, we can still enable parallelism by
allowing partial_path for window functions and subpaths, with parallel
scanning of underlying tables for data filtering.
Exclude CASE WHEN expressions in window functions (as they
complicate parallelization and make it difficult to guarantee correct
data ordering)
Example non-parallel execution plan:
SELECT sum(salary) OVER w, rank() OVER w FROM empsalary WINDOW w AS
(PARTITION BY depname ORDER BY salary DESC);
QUERY PLAN
----------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
-> WindowAgg
Partition By: depname
Order By: salary
-> Sort
Sort Key: depname, salary DESC
-> Seq Scan on empsalary
Parallel execution plan (4-parallel):
SELECT sum(salary) OVER w, rank() OVER w FROM empsalary WINDOW w AS
(PARTITION BY depname ORDER BY salary DESC);
QUERY PLAN
---------------------------------------------------------------------
Gather Motion 12:1 (slice1; segments: 12)
-> WindowAgg
Partition By: depname
Order By: salary
-> Sort
Sort Key: depname, salary DESC
-> Redistribute Motion 12:12 (slice2; segments: 12)
Hash Key: depname
Hash Module: 3
-> Parallel Seq Scan on empsalary
In complex queries containing window functions, parallel processing may
sometimes be inhibited due to cost considerations or other constraints.
However, our approach still provides valuable parallelization
opportunities for window function subpaths, delivering measurable query
efficiency improvements. We have observed significant performance gains
in TPC-DS benchmarks through this partial parallelization capability.
TPC-DS queries via parallel execution plans (50G AOCS, 4 workers):
| Query | Before(ms) | After(ms) | Saved(ms) | Gain | Plan Change |
|-------|-----------:|----------:|----------:|------:|-----------------|
| q12 | 10,439.08 | 4,613.52 | 5,825.56 | 55.8% | serial→parallel |
| q20 | 21,487.08 | 8,723.74 | 12,763.34 | 59.4% | serial→parallel |
| q44 | 33,816.75 | 22,515.03 | 11,301.72 | 33.4% | better parallel |
| q49 | 60,039.45 | 28,603.51 | 31,435.95 | 52.4% | serial→parallel |
| q98 | 40,114.21 | 17,052.78 | 23,061.43 | 57.5% | serial→parallel |
changes:
- Enabled parallel plans for q12/q20/q49/q98 (prev. serial)
- Optimized parallel plan for q44
- Avg gain: 52% (best: q20 59.4%, saved 12.7s)
Authored-by: Zhang Mingli [email protected]
0 commit comments