Skip to content

Conversation

@satvshr
Copy link
Collaborator

@satvshr satvshr commented Sep 30, 2025

Stacks on #114
closes #164 and closes #190

Adds a tutorial notebook for benchmarking.

@satvshr satvshr changed the title [ENH] Notebook for Benchmarking [ENH] making AptaTrans sklearn-like and notebook for Benchmarking Nov 6, 2025
@satvshr satvshr marked this pull request as draft November 6, 2025 12:57
@satvshr
Copy link
Collaborator Author

satvshr commented Nov 9, 2025

@NennoMP for whenever you have time, given I think I have done most of what I could do:

  1. Check out predictas it doesnt use Trainer.test at the moment it uses MCTS' recommend.
  2. If you run the current benchmarking notebook you will get this error: RuntimeError: The size of tensor a (0) must match the size of tensor b (128) at non-singleton dimension 1 which is something to look into.
  3. Update tests to present pipeline implementation if you find current work satsfying enough.

@NennoMP
Copy link
Collaborator

NennoMP commented Nov 9, 2025

@NennoMP for whenever you have time, given I think I have done most of what I could do:

  1. Check out predictas it doesnt use Trainer.test at the moment it uses MCTS' recommend.
  2. If you run the current benchmarking notebook you will get this error: RuntimeError: The size of tensor a (0) must match the size of tensor b (128) at non-singleton dimension 1 which is something to look into.
  3. Update tests to present pipeline implementation if you find current work satsfying enough.

I'll give it a look this week.

Where is bug (2.) occurring specifically, an you provide the traceback? The AptaTrans notebook is working on my side, so the problem could be in the benchmark class and/or benchmark notebook.

@satvshr
Copy link
Collaborator Author

satvshr commented Nov 9, 2025

Where is bug (2.) occurring specifically, an you provide the traceback? The AptaTrans notebook is working on my side, so the problem could be in the benchmark class and/or benchmark notebook.

I do not think I implemented AptaTrans correctly😅 It is failing before benchmarking in this cell in the benchmarking notebook:

# specify the target protein sequence here
target_protein = (
    "STEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAM"
    "RDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTR"
    "QGVDDAFYTLVREIRKHKEKMSK"
)

pipeline = AptaTransPipeline(
    device=device,
    model=model,
    prot_words=prot_words,
    depth=1,  # depth of the search (i.e., length of generated candidates)
    n_iterations=1,  # higher is better but slower, suggested: 1000
)
candidates = pipeline.recommend(
    target=target_protein,
    n_candidates=1,  # number of candidates to generate
    verbose=True,
)

with this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[8], [line 15](vscode-notebook-cell:?execution_count=8&line=15)
      2 target_protein = (
      3     "STEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAM"
      4     "RDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTR"
      5     "QGVDDAFYTLVREIRKHKEKMSK"
      6 )
      8 pipeline = AptaTransPipeline(
      9     device=device,
     10     model=model,
   (...)     13     n_iterations=1,  # higher is better but slower, suggested: 1000
     14 )
---> [15](vscode-notebook-cell:?execution_count=8&line=15) candidates = pipeline.recommend(
     16     target=target_protein,
     17     n_candidates=1,  # number of candidates to generate
     18     verbose=True,
     19 )

File c:\Users\satvm\miniconda3\envs\pyaptamer-latest\Lib\site-packages\torch\utils\_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> [116](file:///C:/Users/satvm/miniconda3/envs/pyaptamer-latest/Lib/site-packages/torch/utils/_contextlib.py:116)         return func(*args, **kwargs)
...
---> [85](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/satvm/pyaptamer/examples/~/pyaptamer/pyaptamer/aptatrans/layers/_encoder.py:85) out = x + self.pe[:, : x.shape[1], :]
     86 if self.dropout:
     87     out = self.dropout(out)

RuntimeError: The size of tensor a (0) must match the size of tensor b (128) at non-singleton dimension 1

@satvshr
Copy link
Collaborator Author

satvshr commented Nov 13, 2025

I think we can close #166 with this too @NennoMP ?

@NennoMP NennoMP marked this pull request as ready for review November 18, 2025 16:33
@NennoMP NennoMP marked this pull request as draft November 18, 2025 16:34
@NennoMP
Copy link
Collaborator

NennoMP commented Nov 18, 2025

  1. Check out predictas it doesnt use Trainer.test at the moment it uses MCTS' recommend.

This method was intended as a quick, easy way to evaluate a pair (candidate, target protein) in string format. You could keep it and simply rename to evaluate(...). Then, add a new method predict(...) where you put the Trainer.test(...) logic (you can get this from the notebook). As I mentioned, the pipeline for AptaTrans wasn't designed with the same purpose of the pipeline for AptaNet.

  1. If you run the current benchmarking notebook you will get this error: RuntimeError: The size of tensor a (0) must match the size of tensor b (128) at non-singleton dimension 1 which is something to look into.

I just ran this and you are right. The issue is in setting the tree search depth of MCTS to 1 (i.e., depth=1). Specifically, it seems the positional encodings in AptaTrans break for depth <= 2. I will try to look into this separately, for now just use a depth >=3.

This isn't really a issue though. I think you would never want aptamer candidate of length as short as 1 or 2.

File \pyaptamer\pyaptamer\aptatrans\layers\_encoder.py:85, in PositionalEncoding.forward(self, x)
     68 """Forward pass.
     69 
     70 Parameters
   (...)     79     positional encodings applied.
     80 """
     81 assert x.shape[1] <= self.max_len, (
     82     f"Input sequence length {x.shape[1]} exceeds maximum length {self.max_len}."
     83 )
---> 85 out = x + self.pe[:, : x.shape[1], :]
     86 if self.dropout:
     87     out = self.dropout(out)

RuntimeError: The size of tensor a (0) must match the size of tensor b (128) at non-singleton dimension 1

By the way, the branch is a few commits behind main.

@satvshr
Copy link
Collaborator Author

satvshr commented Nov 18, 2025

Oh I thought you could take over on this is what I implied last meeting 😅 The points I made were the points for you to address/things still left to do. If you cannot pick this up let me know and I will try and have a look at it, felt it would be way easier of a task for you than me hence the suggestion.

Edit: @NennoMP I just realised we are yet to make AptaTrans regression-friendly! (maybe part of another PR though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Changes to AptaTransPipeline to make it sklearn-like [ENH] Notebook for Benchmarking

4 participants