- 
                Notifications
    You must be signed in to change notification settings 
- Fork 127
[tstate] inline byte to string conversions #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
fe7f852    to
    3002fd4      
    Compare
  
    Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
| // checkScope returns whether [k] is in ts.readScope. | ||
| func (ts *TState) checkScope(_ context.Context, k []byte) bool { | ||
| for _, s := range ts.scope { | ||
| // TODO: benchmark and see if creating map is worth overhead | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this very much depends on the size of the set. since lookup is O(n) its reasonable to have a map if we might have 8+ keys. I added the change for ref but we can revert the commit if we feel the set size would remain similar to tokenvm (4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the lookup is much faster for map it does increase the alloc in FetchAndSetScope by a fair amount. As this set grows into possibly hundreds the time gained on lookup will be probably overshadowed by memory bloat/cpu time.
before
goos: linux
goarch: amd64
pkg: github.com/ava-labs/hypersdk/tstate
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkFetchAndSetScope/fetch_and_set_scope_4_keys_with_length_65-12         	  894080	      1408 ns/op	     800 B/op	       7 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_8_keys_with_length_65-12         	  687463	      2001 ns/op	    1473 B/op	      11 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_16_keys_with_length_65-12        	  372249	      3031 ns/op	    2833 B/op	      19 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_32_keys_with_length_65-12        	  243076	      5262 ns/op	    5424 B/op	      35 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_64_keys_with_length_65-12        	  110935	     14345 ns/op	   11416 B/op	      68 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_128_keys_with_length_65-12       	   42150	     26776 ns/op	   22680 B/op	     132 allocs/op
after
goos: linux
goarch: amd64
pkg: github.com/ava-labs/hypersdk/tstate
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkFetchAndSetScope/fetch_and_set_scope_4_keys_with_length_65-12         	  716866	      1802 ns/op	    1392 B/op	      14 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_8_keys_with_length_65-12         	  404772	      2923 ns/op	    2529 B/op	      22 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_16_keys_with_length_65-12        	  282474	      4318 ns/op	    4824 B/op	      38 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_32_keys_with_length_65-12        	  157590	      7921 ns/op	    9284 B/op	      70 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_64_keys_with_length_65-12        	   83341	     15155 ns/op	   19376 B/op	     136 allocs/op
BenchmarkFetchAndSetScope/fetch_and_set_scope_128_keys_with_length_65-12       	   42186	     40291 ns/op	   38448 B/op	     264 allocs/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverting for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the cost of lookup as the set increases.
BenchmarkCheckScope/set_scope_4_keys_with_length_65-12         	55952859	        21.66 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckScope/set_scope_8_keys_with_length_65-12         	30695019	        39.45 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckScope/set_scope_16_keys_with_length_65-12        	14245750	        84.93 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckScope/set_scope_32_keys_with_length_65-12        	 7016282	       160.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckScope/set_scope_64_keys_with_length_65-12        	 3868280	       286.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkCheckScope/set_scope_128_keys_with_length_65-12       	 2166471	       547.1 ns/op	       0 B/op	       0 allocs/op
8f89b1a    to
    f687764      
    Compare
  
    Signed-off-by: Sam Batschelet <[email protected]>
Signed-off-by: Sam Batschelet <[email protected]>
| We should do the same for: https://github.com/ava-labs/hypersdk/blob/main/chain/processor.go | 
Signed-off-by: Sam Batschelet <[email protected]>
| This PR has become stale because it has been open for 30 days with no activity. Adding the  | 
I explored a few alternative approaches and generally found this PR does provide some small gains that could be considered as an incremental performance improvement. During research I found that using a 64bit hash such as
xxhash64for the map key was a large performance improvement this is a the approach taken with for examplefast cache. But since we also need to handle hash collisions the complexity reduced performance until it was not worth the effort at small scale.This approach is documented here https://pthevenet.com/posts/programming/go/bytesliceindexedmaps/#optimization local results from the benchmark confirm.
before
after
benchvset-after.cpu.txt
[1] https://github.com/VictoriaMetrics/fastcache