Skip to content

Conversation

allenshen13
Copy link
Member

This PR introduces a new configuration parameter no_random_duplicates that prevents duplicate query selection during random execution until all queries have been executed once.

Changes:

  • Added NoRandomDuplicates field to the Stage struct with JSON tag 'no_random_duplicates'
  • Implemented bag-based selection algorithm in runRandomlyWithoutDuplicates() method:
    - Queries are randomly selected and removed from a "bag" to prevent duplicates within each round
    - When the bag is exhausted (all queries executed once), it automatically refills for subsequent rounds

Usage:
{ "random_execution": true, "randomly_execute_until": "10", "no_random_duplicates": true, "query_files": ["query1.sql", "query2.sql", "query3.sql"] }

@minhancao minhancao self-requested a review August 7, 2025 19:19
minhancao
minhancao previously approved these changes Aug 7, 2025
@allenshen13
Copy link
Member Author

Testing Procedure + Results

Test 1: https://presto.ibm.prestodb.dev/pbench-run-details/1530

  • Description: 5 streams each executing 5 queries from a pool of 5 complex queries
  • What happened: Each stream executed each query exactly once with no duplicates. Overall, each query was executed 5 times ✅

Test 2: https://presto.ibm.prestodb.dev/pbench-run-details/1529

  • Description: 1 stream executing 50 queries from a pool of 25 intermediate queries
  • What happened: Each of the 25 queries was executed twice. The first 25 queries and last 25 queries that were ran had no duplicates. ✅

Change MergeWith function to support NoRandomDuplicates
@allenshen13 allenshen13 force-pushed the concurrency_testing_add_features branch from 4ec3af7 to 2c5ddb4 Compare August 14, 2025 18:54
Copy link
Contributor

@minhancao minhancao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plesee add documentation to https://github.com/prestodb/pbench/wiki/Parameters for your new flag

@allenshen13
Copy link
Member Author

allenshen13 commented Aug 15, 2025

no_random_duplicates Documentation (Sent to Steve Burnett for editing)

Format

"no_random_duplicates": Boolean

Definition

When no_random_duplicates is set to false (default), PBench allows the same query to be selected multiple times during random execution, potentially causing immediate repetition.

When no_random_duplicates is set to true, PBench implements a "bag-based" selection algorithm that prevents query repetition within execution rounds. Each query is guaranteed to be executed exactly once per "round" before any query can be repeated.

This feature works by:

  1. Creating a "bag" containing all available queries
  2. Randomly selecting queries from the bag
  3. Removing selected queries from the bag to prevent immediate repetition
  4. Refilling the bag when empty (all queries executed once) to start a new round

no_random_duplicates can only be used when random_execution is set to true.

Use Cases

This feature is particularly useful for:

  • Performance benchmarking requiring fair query distribution
  • Cache behavior analysis (prevents artificial cache warming)
  • Comprehensive testing scenarios where all queries should be tested evenly
  • Situations where predictable query execution patterns are needed

}

// Initialize the bag with all available query indices
var bag []int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using a map?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants