Fix ValueError: numel (2097152) exceeds triton maximum tensor numel (1048576) #625

mengluy0125 · 2025-09-18T18:30:58Z

Summary: We add block_size heuristics for autotune since it can have ValueError for tensors with large shape.

Differential Revision: D82742888

facebook-github-bot · 2025-09-18T18:31:09Z

@mengluy0125 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82742888.

…1048576) (pytorch#625) Summary: We add block_size heuristics for autotune since it can have ValueError for tensors with large shape. Differential Revision: D82742888

facebook-github-bot · 2025-09-18T18:56:16Z

@mengluy0125 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82742888.

jansel

IMO capping the size on a per-dimension basis isn't ideal. Since if you have multiple dimensions you need the product of the block sizes to be less than the limit. So the cap is not dimension independent.

Maybe we could improve the heuristics in:

helion/helion/autotuner/config_generation.py

Lines 90 to 113 in 0a33998

    
               def shrink_config( 
        
                   self, flat_config: FlatConfig, max_elements_per_thread: int 
        
               ) -> None: 
        
                   """ 
        
                   Fully random configs tend to run out of resources and tile a long time to compile. 
        
                   Here we shrink the config to a reasonable size. 
        
                   Args: 
        
                       flat_config: config to mutate in place 
        
                       max_elements_per_thread: maximum number of elements per thread 
        
                   """ 
        
                   num_threads = warps_to_threads(cast("int", flat_config[self.num_warps_index])) 
        
                   max_elements = max_elements_per_thread * num_threads 
        
                   while self.block_numel(flat_config) > max_elements: 
        
                       changes = 0 
        
                       for i in self.block_size_indices: 
        
                           val = flat_config[i] 
        
                           assert isinstance(val, int) 
        
                           threshold = max(self.flat_spec[i].get_minimum(), self.min_block_size) 
        
                           if val // 2 >= threshold: 
        
                               flat_config[i] = val // 2 
        
                               changes += 1 
        
                       if changes == 0: 
        
                           break

to address this problem?

mengluy0125 · 2025-09-18T20:36:38Z

IMO capping the size on a per-dimension basis isn't ideal. Since if you have multiple dimensions you need the product of the block sizes to be less than the limit. So the cap is not dimension independent.

Maybe we could improve the heuristics in:

helion/helion/autotuner/config_generation.py

Lines 90 to 113 in 0a33998

def shrink_config(

self, flat_config: FlatConfig, max_elements_per_thread: int

) -> None:

"""

Fully random configs tend to run out of resources and tile a long time to compile.

Here we shrink the config to a reasonable size.

Args:

flat_config: config to mutate in place

max_elements_per_thread: maximum number of elements per thread

"""

num_threads = warps_to_threads(cast("int", flat_config[self.num_warps_index]))

max_elements = max_elements_per_thread * num_threads

while self.block_numel(flat_config) > max_elements:

changes = 0

for i in self.block_size_indices:

val = flat_config[i]

assert isinstance(val, int)

threshold = max(self.flat_spec[i].get_minimum(), self.min_block_size)

if val // 2 >= threshold:

flat_config[i] = val // 2

changes += 1

if changes == 0:

break

to address this problem?

Oh, make sense. Let me do the heuristics improvement inside the function.

…1048576) (pytorch#625) Summary: We add block_size heuristics for autotune since it can have ValueError for tensors with large shape. Differential Revision: D82742888

facebook-github-bot · 2025-09-18T20:47:10Z

@mengluy0125 has exported this pull request. If you are a Meta employee, you can view the originating diff in D82742888.

oulgen · 2025-09-18T22:37:21Z

helion/autotuner/config_generation.py

+        # Respect Triton's maximum tensor element limit
+        triton_limit = TRITON_MAX_TENSOR_NUMEL
+        theoretical_max_elements = max_elements_per_thread * num_threads
+        max_elements = min(theoretical_max_elements, triton_limit)


So I think this is overcounting, exact same problem I had on #485

self.block_numel(flat_config) > 2 ** 20

doesn't actually mean in the triton code we emit, we will end up with a tensor of numel 2 ** 20.

I agree, though maybe this is better than what we have now.

jansel · 2025-09-19T04:08:33Z

helion/autotuner/config_generation.py

+        # Respect Triton's maximum tensor element limit
+        triton_limit = TRITON_MAX_TENSOR_NUMEL
+        theoretical_max_elements = max_elements_per_thread * num_threads
+        max_elements = min(theoretical_max_elements, triton_limit)


I agree, though maybe this is better than what we have now.

yf225 · 2025-09-19T05:59:11Z

Merging it to fix benchmark CI issues.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 18, 2025

mengluy0125 force-pushed the export-D82742888 branch from 66de86d to ad6ac1e Compare September 18, 2025 18:56

jansel requested changes Sep 18, 2025

View reviewed changes

Fix ValueError: numel (2097152) exceeds triton maximum tensor numel (…

0945338

…1048576) (pytorch#625) Summary: We add block_size heuristics for autotune since it can have ValueError for tensors with large shape. Differential Revision: D82742888

mengluy0125 force-pushed the export-D82742888 branch from ad6ac1e to 0945338 Compare September 18, 2025 20:47

yf225 mentioned this pull request Sep 18, 2025

Limit block sizes in autotuning because of triton max numel range #456

Closed

oulgen reviewed Sep 18, 2025

View reviewed changes

jansel approved these changes Sep 19, 2025

View reviewed changes

yf225 merged commit 3d5b641 into pytorch:main Sep 19, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ValueError: numel (2097152) exceeds triton maximum tensor numel (1048576) #625

Fix ValueError: numel (2097152) exceeds triton maximum tensor numel (1048576) #625

Uh oh!

mengluy0125 commented Sep 18, 2025

Uh oh!

facebook-github-bot commented Sep 18, 2025

Uh oh!

facebook-github-bot commented Sep 18, 2025

Uh oh!

jansel left a comment •

edited

Loading

Uh oh!

mengluy0125 commented Sep 18, 2025

Uh oh!

facebook-github-bot commented Sep 18, 2025

Uh oh!

oulgen Sep 18, 2025

Uh oh!

jansel Sep 19, 2025

Uh oh!

jansel Sep 19, 2025

Uh oh!

yf225 commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	def shrink_config(
	self, flat_config: FlatConfig, max_elements_per_thread: int
	) -> None:
	"""
	Fully random configs tend to run out of resources and tile a long time to compile.
	Here we shrink the config to a reasonable size.

	Args:
	flat_config: config to mutate in place
	max_elements_per_thread: maximum number of elements per thread
	"""
	num_threads = warps_to_threads(cast("int", flat_config[self.num_warps_index]))
	max_elements = max_elements_per_thread * num_threads
	while self.block_numel(flat_config) > max_elements:
	changes = 0
	for i in self.block_size_indices:
	val = flat_config[i]
	assert isinstance(val, int)
	threshold = max(self.flat_spec[i].get_minimum(), self.min_block_size)
	if val // 2 >= threshold:
	flat_config[i] = val // 2
	changes += 1
	if changes == 0:
	break

Fix ValueError: numel (2097152) exceeds triton maximum tensor numel (1048576) #625

Fix ValueError: numel (2097152) exceeds triton maximum tensor numel (1048576) #625

Uh oh!

Conversation

mengluy0125 commented Sep 18, 2025

Uh oh!

facebook-github-bot commented Sep 18, 2025

Uh oh!

facebook-github-bot commented Sep 18, 2025

Uh oh!

jansel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengluy0125 commented Sep 18, 2025

Uh oh!

facebook-github-bot commented Sep 18, 2025

Uh oh!

oulgen Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

jansel Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

yf225 commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jansel left a comment •

edited

Loading