Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
264 commits
Select commit Hold shift + click to select a range
04ab599
Made suggested changes
satvshr Jul 11, 2025
dc78e44
Removed lint. from pyproject, will push it as a separate PR
satvshr Jul 11, 2025
c347988
Refactored code
satvshr Jul 11, 2025
d9537f4
Added pandas as a dependancy
satvshr Jul 11, 2025
1c46c55
Renamed parent folder name to put it in the same level as AptaNet
satvshr Jul 11, 2025
a716872
Merge remote-tracking branch 'origin/main' into issue13
satvshr Jul 13, 2025
7781441
Refactored code and made architecture flexible
satvshr Jul 14, 2025
e762cc8
Edited docstrings and directory structure
satvshr Jul 14, 2025
e844d4f
Merge branch 'main' into issue28
satvshr Jul 14, 2025
f9392ef
weird rename experiment
satvshr Jul 14, 2025
beb45ec
weird rename experiment pt. 2
satvshr Jul 14, 2025
d603d07
Made requested changes
satvshr Jul 14, 2025
6ecf576
Made requested changes
satvshr Jul 15, 2025
b91c511
Made requested changes
satvshr Jul 15, 2025
b2428b0
chore: dummy commit to retrigger CI
satvshr Jul 15, 2025
2982954
Added missing init file to utils
satvshr Jul 15, 2025
0b5b388
Made requested changes
satvshr Jul 16, 2025
d24c4d7
Merge branch 'main' into issue28
satvshr Jul 16, 2025
0cd72b7
Added requested changes
satvshr Jul 16, 2025
fabc7b4
Added requested changes
satvshr Jul 16, 2025
32633d3
Added info about prop groups in class docstring
satvshr Jul 16, 2025
2b08363
Merge branch 'main' into issue13
satvshr Jul 17, 2025
f502fed
Merge branch 'issue28' into issue13
satvshr Jul 17, 2025
056c08e
Merged issue28 to issue13
satvshr Jul 17, 2025
6136c39
Removed init method description
satvshr Jul 17, 2025
ae7d1fe
Removed init method description
satvshr Jul 17, 2025
f339a7b
Added tests and bug fixes
satvshr Jul 17, 2025
651d066
Added torch as a dependency
satvshr Jul 17, 2025
7991cc8
Added torch as a dependency
satvshr Jul 17, 2025
8f0f0ae
Added sklearn as a dependency
satvshr Jul 17, 2025
60ec8da
fixed test so that protein_seq length is above 30
satvshr Jul 17, 2025
88c0122
editing changes
satvshr Jul 17, 2025
b7a7349
Made requested changes
satvshr Jul 18, 2025
c14c0bb
Made requested changes
satvshr Jul 18, 2025
d1075a7
Added .vscode to .gitignore
satvshr Jul 22, 2025
839c3e5
Merge branch 'issue28' into issue13
satvshr Jul 22, 2025
19a9e98
Added metadata
satvshr Jul 22, 2025
945addc
Merge branch 'issue28' into issue13
satvshr Jul 22, 2025
3c1fa3a
Added architectural changes
satvshr Jul 23, 2025
6e6836e
Added pdb to string helper function
satvshr Jul 23, 2025
73773af
Made requested changes
satvshr Jul 24, 2025
e2449e0
removed deleted files
satvshr Jul 24, 2025
212d54b
Added architecture to support loaders along with a loader for PFOA
satvshr Jul 28, 2025
8a29c09
Merge branch 'main' into issue55
satvshr Jul 28, 2025
3b09533
Added tests
satvshr Jul 28, 2025
29e8ed9
Merge branch 'main' into issue9
satvshr Jul 28, 2025
9d9738f
Added tests
satvshr Jul 28, 2025
210f09c
Renamed test file
satvshr Jul 28, 2025
6fb1db5
Made requested changes
satvshr Jul 28, 2025
c57e66a
Merge branch 'issue9' into issue55
satvshr Jul 28, 2025
8c00454
Merge branch 'main' into issue13
satvshr Jul 29, 2025
2dbf5d5
Stop tracking .vscode folder
satvshr Jul 29, 2025
8767f11
Added docstrings
satvshr Jul 29, 2025
32d7673
Fixed bugs to ensure pytorch compatibility by type-casting to float32
satvshr Jul 29, 2025
428bf68
Saving work
satvshr Jul 29, 2025
f94cd6a
Added an improved test to test the pipeline instead
satvshr Jul 29, 2025
1ac72fc
Renamed function
satvshr Jul 29, 2025
8278c74
Merge branch 'issue9' into issue54
satvshr Jul 29, 2025
0d16e13
Stop tracking .vscode folder
satvshr Jul 29, 2025
a38f70b
Stop tracking .vscode folder
satvshr Jul 29, 2025
40ba9d3
Added 3eiy and its loader, standardized loader tests
satvshr Jul 30, 2025
fad2f97
Added 3eiy and its loader, standardized loader tests
satvshr Jul 30, 2025
4a17dac
Adds 1ghn instead of 3eiy
satvshr Jul 30, 2025
cace4bd
Adds 1ghn instead of 3eiy
satvshr Jul 30, 2025
3301a4c
Renamed ghn to gnh
satvshr Jul 30, 2025
f61fdbc
Merge branch 'issue65' into issue55
satvshr Jul 30, 2025
28de2a3
Used 1gnh in tests instead of pfoa since pfoa is not a protein
satvshr Jul 30, 2025
17855b3
Merge branch 'issue55' into issue54
satvshr Jul 30, 2025
c6ccb23
Moved utility function from pipeline to utils
satvshr Jul 30, 2025
8304148
Merge branch 'issue13' into issue54
satvshr Jul 30, 2025
e9711c0
Added pytho ndpendency because of skorch
satvshr Jul 31, 2025
bf20e3c
Merge branch 'issue13' into issue54
satvshr Jul 31, 2025
0955226
Added notebook to examples directory
satvshr Jul 31, 2025
e6e3c9b
Added first draft of notebook
satvshr Jul 31, 2025
7273082
Logging error
satvshr Jul 31, 2025
ad8db59
Fixed bug during fit
satvshr Jul 31, 2025
90ffc7e
Merge branch 'main' into issue54
satvshr Jul 31, 2025
fa869d5
Added requested changes
satvshr Aug 3, 2025
38ec2ca
Merge branch 'main' into issue13
fkiraly Aug 3, 2025
e0d0af3
Update pyproject.toml
fkiraly Aug 3, 2025
5a10d65
Update pyproject.toml
fkiraly Aug 3, 2025
0b04ce8
Merge branch 'main' into issue13
satvshr Aug 3, 2025
64fd01b
Changed workflow file to stop testing for python 3.13
satvshr Aug 3, 2025
eeef5d6
Added skorch as a dependency
satvshr Aug 3, 2025
144b405
bug fix
satvshr Aug 3, 2025
ae9a814
Removed 3.13 as a non dependency
satvshr Aug 3, 2025
c6fe37e
Merged main
satvshr Aug 3, 2025
e9846bc
Added init file
satvshr Aug 3, 2025
b80f145
Added skip test
satvshr Aug 4, 2025
eef91ff
Added test
satvshr Aug 4, 2025
bbdaf27
Merged with issue13
satvshr Aug 4, 2025
01090c5
Merge with main
satvshr Aug 4, 2025
6f04f11
Added init file
satvshr Aug 4, 2025
63226a5
Removed loader folder
satvshr Aug 4, 2025
9a74b71
fixed bug
satvshr Aug 4, 2025
4266183
Merged with issue55
satvshr Aug 4, 2025
ac6851c
Removed unecessary classes
satvshr Aug 4, 2025
3adb75a
Fixed some bugs and renames
satvshr Aug 4, 2025
2aeea11
Changed pipeline to a class.
satvshr Aug 5, 2025
912ec72
Updated to add optimizer
satvshr Aug 5, 2025
e29380a
Fixed pipeline bug
satvshr Aug 6, 2025
36f750d
docstring save commit
satvshr Aug 9, 2025
badfafe
Adding failures
satvshr Aug 9, 2025
66a1620
Changed file names and added 2 classes
satvshr Aug 9, 2025
98327d8
Trying to make tests pass
satvshr Aug 9, 2025
f264228
Tests test
satvshr Aug 9, 2025
4a3a622
Update _feature_classifier.py
satvshr Aug 9, 2025
11eef33
Update _feature_classifier.py
satvshr Aug 10, 2025
1b3e178
Update _feature_classifier.py
satvshr Aug 10, 2025
28434c9
Update _feature_classifier.py
satvshr Aug 10, 2025
67ce5bb
Added docstrings back
satvshr Aug 10, 2025
1988408
Spacing for lists
satvshr Aug 10, 2025
1cd198b
Merge branch 'issue13' into issue54
satvshr Aug 11, 2025
cc64583
Update _feature_classifier.py
satvshr Aug 11, 2025
355718e
Merge branch 'issue13' into issue54
satvshr Aug 11, 2025
9f94386
Update aptanet_tutorial.ipynb
satvshr Aug 11, 2025
08297c0
Update aptanet_tutorial.ipynb
satvshr Aug 11, 2025
bc8355f
Update aptanet_tutorial.ipynb
satvshr Aug 11, 2025
5882b24
Update aptanet_tutorial.ipynb
satvshr Aug 11, 2025
06a44a4
Merge branch 'main' into issue13
satvshr Aug 12, 2025
183c48b
Made requested and architectural changes
satvshr Aug 12, 2025
4a8f9dd
Merge branch 'issue13' into issue54
satvshr Aug 12, 2025
492e054
Merge branch 'main' into issue54
fkiraly Aug 14, 2025
e5b1ddd
Update pyproject.toml
satvshr Aug 14, 2025
84eefa1
Delete one_gnh.py
satvshr Aug 15, 2025
1c78a4b
Update pyproject.toml
satvshr Aug 17, 2025
ea608f4
Merge branch 'main' into issue54
satvshr Aug 19, 2025
d9a5df9
Update aptanet_tutorial.ipynb
satvshr Aug 19, 2025
55dcb86
Added workflows
satvshr Aug 20, 2025
96e6f54
Merge branch 'main' into issue109
satvshr Aug 22, 2025
d7da265
Merge branch 'main' into issue109
satvshr Aug 22, 2025
bce4cd1
Getting aptanet ready
satvshr Aug 23, 2025
21901d7
Add aptamer experiment for AptaNet
NennoMP Aug 26, 2025
624acad
Make AptaNetClassifier compliant with sklearn estimator interface, im…
NennoMP Aug 28, 2025
4273dcc
Remove file generated by tests
NennoMP Aug 28, 2025
de7fbc0
Minor change to pipeline
NennoMP Aug 31, 2025
83c072f
Remove folders generated by tests, updated docstring with example
NennoMP Aug 31, 2025
98a3a1f
Initial setup
satvshr Sep 1, 2025
57a113f
Added metaclass to ensre no new public methods, made some other progr…
satvshr Sep 3, 2025
3d1f930
Merge branch 'main' into issue109
satvshr Sep 3, 2025
1f400b0
Progress on Preprocessors, AptaNetPreprocessor seems completed
satvshr Sep 3, 2025
8b46667
Continued improving on benchmarking and preprocessing
satvshr Sep 4, 2025
b91eaad
bug fixing and improvements
satvshr Sep 4, 2025
3f808d5
Revert file renaming, minor changes to docstrings
NennoMP Sep 4, 2025
5e3f4d6
seems to be working, very slow though
satvshr Sep 4, 2025
0e79b7f
Update _aptanet_utils.py
satvshr Sep 6, 2025
179fefc
Merge branch 'main' into feature/97-aptanet-mcts-compatible
NennoMP Sep 8, 2025
f11853d
merge with main
satvshr Sep 9, 2025
8d875e2
reset main
satvshr Sep 9, 2025
effb111
reset pt2
satvshr Sep 9, 2025
2dcc61f
AptaNet bug fix
satvshr Sep 9, 2025
15029ad
Update test_aptanet.py
satvshr Sep 9, 2025
2fe0e9d
Update _base.py
satvshr Sep 10, 2025
e62a32b
docstring changes
satvshr Sep 11, 2025
2c196be
Merge branch 'main' into feature/97-aptanet-mcts-compatible
NennoMP Sep 11, 2025
aef8589
Refactor aptamer eval tests, minor improvements to docstrings
NennoMP Sep 11, 2025
b3abadb
Run pre-commit
NennoMP Sep 11, 2025
4ace402
Update
NennoMP Sep 11, 2025
45094f6
Update _base.py
satvshr Sep 12, 2025
e38770e
Update _base.py
satvshr Sep 15, 2025
aa9677b
Update _base.py
satvshr Sep 15, 2025
b5724dc
Merge branch 'main' into issue109
satvshr Sep 15, 2025
7e25cd4
Added sklearn-like csv loader
satvshr Sep 16, 2025
0466310
Making requested changes
satvshr Sep 16, 2025
efb6fcf
cleaned up code
satvshr Sep 17, 2025
022d748
Update _base.py
satvshr Sep 17, 2025
63ddaf6
Update test_csv_loader.py
satvshr Sep 17, 2025
d27a495
Merge branch 'main' into issue150
satvshr Sep 17, 2025
c982090
Update aptanet_tutorial.ipynb
satvshr Sep 17, 2025
2c06084
Update _base.py
satvshr Sep 21, 2025
7027d47
Update _base.py
satvshr Sep 21, 2025
ec96bba
cleaning code remove tag checks
satvshr Sep 21, 2025
d6111a9
Update _base.py
satvshr Sep 21, 2025
2e2d71f
Test suite added and bugs fixed
satvshr Sep 21, 2025
90af1ee
arg name fixing
satvshr Sep 21, 2025
7adff54
save
satvshr Sep 21, 2025
971ae29
Update _csv_loader.py
satvshr Sep 22, 2025
fe19150
Update test_csv_loader.py
satvshr Sep 22, 2025
9162d0c
Added both loaders and tests
satvshr Sep 23, 2025
b9c7d6c
Update pyproject.toml
satvshr Sep 23, 2025
24b6e90
Merge branch 'issue149' into issue150
satvshr Sep 23, 2025
3939af5
merge aptaeval branch
satvshr Sep 23, 2025
207cb92
Update aptanet_tutorial.ipynb
satvshr Sep 24, 2025
835cfe8
Made requested changes
satvshr Sep 25, 2025
d0997dd
Update test_aptatrans.py
satvshr Sep 25, 2025
13f89d3
Resolve merge conflicts
NennoMP Sep 25, 2025
d02bc2e
Merge branch 'issue149' into issue150
satvshr Sep 25, 2025
f6d02e8
Change experiments' return type from tensor to numpy float
NennoMP Sep 25, 2025
54f414b
Fixed circular import and renamed files
satvshr Sep 26, 2025
47c0767
Update test_aptatrans.py
satvshr Sep 27, 2025
51ccc41
Update _csv_loader.py
satvshr Sep 27, 2025
4dd92bd
Update pyproject.toml
satvshr Sep 27, 2025
974a2fa
Minor docstrings and test fixes
NennoMP Sep 28, 2025
eb7c257
Fix minor typo in docstring
NennoMP Sep 29, 2025
9ec9e13
Fix typo in experiment docstrings
NennoMP Sep 29, 2025
bb80a81
Merge branch 'main' into issue109
satvshr Sep 29, 2025
18833f0
fixed docstring and example
satvshr Sep 29, 2025
15aa1cf
Merge branch 'issue149' into issue150
satvshr Sep 30, 2025
b1708a2
Merge branch 'feature/97-aptanet-mcts-compatible' into issue150
satvshr Sep 30, 2025
1467a62
Update aptanet_tutorial.ipynb
satvshr Sep 30, 2025
5e2a9fa
Merge branch 'issue109' into issue150
satvshr Sep 30, 2025
84a2754
Update _aptanet_utils.py
satvshr Sep 30, 2025
afc4010
Made requested changes
satvshr Oct 2, 2025
601c381
Merge branch 'issue149' into issue150
satvshr Oct 6, 2025
fe53b18
Merge branch 'issue109' into issue150
satvshr Oct 6, 2025
f10f1ee
merge main
satvshr Oct 6, 2025
08a8640
Merge with main
satvshr Oct 17, 2025
deb04af
merge main
satvshr Oct 17, 2025
166aa1a
Merge branch 'issue149' into issue150
satvshr Oct 17, 2025
86605d4
Removed benchmarking dependency
satvshr Oct 17, 2025
435f093
Update aptanet_tutorial.ipynb
satvshr Oct 17, 2025
6624bcd
Update aptanet_tutorial.ipynb
satvshr Oct 17, 2025
e9cc6a9
Update aptanet_tutorial.ipynb
satvshr Oct 18, 2025
3ac6bdf
Update _hf_to_dataset.py
satvshr Oct 18, 2025
e3df365
Update aptanet_tutorial.ipynb
satvshr Oct 21, 2025
e8757b4
Update test.yml
satvshr Oct 21, 2025
e154f3e
dataloader
fkiraly Oct 25, 2025
ff19456
lint
fkiraly Oct 25, 2025
b0df3af
renames
fkiraly Oct 25, 2025
b80a9fc
handle letter codes
fkiraly Oct 25, 2025
42e51fa
Update _aa_str_to_letter.py
fkiraly Oct 25, 2025
1f59916
Update test_loader.py
fkiraly Oct 25, 2025
3cab1b2
pfoa 1gnh
fkiraly Oct 25, 2025
dd784b4
loaders
fkiraly Oct 25, 2025
f0893ea
Update loader.py
fkiraly Oct 25, 2025
9e9cb80
Update test_loaders_mol.py
fkiraly Oct 25, 2025
1036315
Update loader.py
fkiraly Oct 25, 2025
38f1784
Update loader.py
fkiraly Oct 25, 2025
8713bc0
Merge main
satvshr Oct 26, 2025
bbc01ee
Merge #155
satvshr Oct 26, 2025
5d31c9c
Merge branch 'dataloader' into datasets-loaded
fkiraly Oct 26, 2025
433af0d
Merge branch 'main' into issue149
fkiraly Oct 26, 2025
4bb2278
trafos
fkiraly Oct 26, 2025
2d2a959
Merge branch 'main' into trafos
fkiraly Oct 26, 2025
5fc3b8d
lint
fkiraly Oct 26, 2025
521f47d
Update _base.py
fkiraly Oct 26, 2025
913791b
Update _greedy.py
fkiraly Oct 26, 2025
c24d6ac
Update _greedy.py
fkiraly Oct 26, 2025
fcf680a
Update __init__.py
fkiraly Oct 26, 2025
e03438c
Merge branch 'main' into issue149
satvshr Nov 1, 2025
2df3862
Merge branch 'trafos' into issue149
satvshr Nov 1, 2025
a0b89ad
Added AnyToAAseq converter
satvshr Nov 1, 2025
54d7337
Update _any_to_aaseq.py
satvshr Nov 2, 2025
1dcb7f9
Merge branch 'main' into issue150
satvshr Nov 13, 2025
cad9201
merge issue149
satvshr Nov 13, 2025
c35a1c5
Updated notebook
satvshr Nov 13, 2025
e7c9c65
Update _hf_to_dataset.py
satvshr Nov 13, 2025
bd7da66
merge main
satvshr Nov 28, 2025
5789bf5
remove dependency on 155
satvshr Nov 28, 2025
c79eda8
Notebook updates and aptatrans dataset loader
satvshr Nov 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
796 changes: 570 additions & 226 deletions examples/aptanet_tutorial.ipynb

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions pyaptamer/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
)
from pyaptamer.datasets._loaders._csv_loader import load_csv_dataset
from pyaptamer.datasets._loaders._hf_loader import load_hf_dataset
from pyaptamer.datasets._loaders._li2014 import (
load_test_li2014,
load_train_li2014,
)
from pyaptamer.datasets._loaders._one_gnh import load_1gnh, load_1gnh_structure
from pyaptamer.datasets._loaders._online_databank import load_from_rcsb
from pyaptamer.datasets._loaders._pfoa import load_pfoa, load_pfoa_structure
Expand All @@ -21,4 +25,6 @@
"load_1gnh_structure",
"load_from_rcsb",
"load_csv_dataset",
"load_train_li2014",
"load_test_li2014",
]
6 changes: 6 additions & 0 deletions pyaptamer/datasets/_loaders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
)
from pyaptamer.datasets._loaders._csv_loader import load_csv_dataset
from pyaptamer.datasets._loaders._hf_loader import load_hf_dataset
from pyaptamer.datasets._loaders._li2014 import (
load_test_li2014,
load_train_li2014,
)
from pyaptamer.datasets._loaders._one_gnh import load_1gnh_structure
from pyaptamer.datasets._loaders._pfoa import load_pfoa_structure

Expand All @@ -19,4 +23,6 @@
"load_pfoa_structure",
"load_1gnh",
"load_1gnh_structure",
"load_train_li2014",
"load_test_li2014",
]
55 changes: 55 additions & 0 deletions pyaptamer/datasets/_loaders/_li2014.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
__author__ = "satvshr"
__all__ = ["load_train_li2014", "load_test_li2014"]
import os

import pandas as pd


def load_train_li2014():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a single load_li2014(split=None), where the default loads the concatenation of both, and you can also select "train" and "test".

"""
Load the Li 2014 training dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some description of the dataset here. What are columns, what do they mean, what can the values be, etc.

Returns
-------
X : pandas.DataFrame
Feature matrix.
y : pandas.Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make both pd.DataFrame for consistency, even if it has only one column.

Labels/target.
"""
# Path relative to this file
path = os.path.abspath(
os.path.join(os.path.dirname(__file__), "..", "data", "train_li2014.csv")
)

df = pd.read_csv(path)

# Basic assumption: last column is the label
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

return X, y


def load_test_li2014():
"""
Load the Li 2014 test dataset.
Returns
-------
X : pandas.DataFrame
Feature matrix.
y : pandas.Series
Labels/target.
"""
# Path relative to this file
path = os.path.abspath(
os.path.join(os.path.dirname(__file__), "..", "data", "test_li2014.csv")
)

df = pd.read_csv(path)

# Basic assumption: last column is the label
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

return X, y
22 changes: 22 additions & 0 deletions pyaptamer/datasets/tests/test_csv_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
__author__ = "satvshr"

import pandas as pd

from pyaptamer.datasets._loaders._csv_loader import load_csv_dataset

DATASET_NAME = "train_li2014"
TARGET_COL = "label"


def test_load_csv_returns_df():
"""
When return_X_y=False the loader should return the full DataFrame containing the
target column.
"""
df = load_csv_dataset(DATASET_NAME)

assert isinstance(df, pd.DataFrame), "Returned object should be a pandas DataFrame"
assert TARGET_COL in df.columns, (
f"DataFrame must contain the target column '{TARGET_COL}'"
)
assert df.shape[0] > 0, "DataFrame should not be empty"
30 changes: 30 additions & 0 deletions pyaptamer/datasets/tests/test_li2014.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
__author__ = "satvshr"

import pandas as pd
import pytest

from pyaptamer.datasets._loaders._li2014 import (
load_test_li2014,
load_train_li2014,
)


@pytest.mark.parametrize(
"loader",
[load_train_li2014, load_test_li2014],
)
def test_loader_li2014(loader):
"""
The loader should return a tuple (X, y) where:
- X is a DataFrame
- y is a Series
- they have matching lengths
"""
X, y = loader()

assert isinstance(X, pd.DataFrame), "X should be a pandas DataFrame"
assert isinstance(y, pd.Series), "y should be a pandas Series"

assert len(X) == len(y), "X and y must have the same number of rows"
assert X.shape[0] > 0, "X should not be empty"
assert y.shape[0] > 0, "y should not be empty"
3 changes: 1 addition & 2 deletions pyaptamer/trafos/base/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,7 @@ def _fit(self, X, y=None):
Returns self.
"""
raise ValueError(
"abstract method _fit called, "
"this should be implemented in the subclass"
"abstract method _fit called, this should be implemented in the subclass"
)

def transform(self, X):
Expand Down
20 changes: 12 additions & 8 deletions pyaptamer/utils/_aptanet_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from itertools import product

import numpy as np
import pandas as pd

from pyaptamer.pseaac import AptaNetPSeAAC

Expand Down Expand Up @@ -59,20 +60,18 @@ def generate_kmer_vecs(aptamer_sequence, k=4):
def pairs_to_features(X, k=4):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs changing, this should be a transformer anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was changed to allow AptaNetPipeline to be able to take df as input, which is something that you had requested.

"""
Convert a list of (aptamer_sequence, protein_sequence) pairs into feature vectors.
Also supports a pandas DataFrame with 'aptamer' and 'protein' columns.

This function generates feature vectors for each (aptamer, protein) pair using:


- k-mer representation of the aptamer sequence
- Pseudo amino acid composition (PSeAAC) representation of the protein sequence


Parameters
----------
X : list of tuple of str
A list where each element is a tuple `(aptamer_sequence, protein_sequence)`.
`aptamer_sequence` should be a string of nucleotides, and `protein_sequence`
should be a string of amino acids.
X : list of tuple of str or pandas.DataFrame
A list where each element is a tuple `(aptamer_sequence, protein_sequence)`,
or a DataFrame containing 'aptamer' and 'protein' columns.

k : int, optional
The k-mer size used to generate the k-mer vector from the aptamer sequence.
Expand All @@ -85,9 +84,14 @@ def pairs_to_features(X, k=4):
for a given (aptamer, protein) pair.
"""
pseaac = AptaNetPSeAAC()

feats = []
for aptamer_seq, protein_seq in X:

if isinstance(X, pd.DataFrame):
pairs = zip(X["aptamer"], X["protein"], strict=False)
else:
pairs = X

for aptamer_seq, protein_seq in pairs:
kmer = generate_kmer_vecs(aptamer_seq, k=k)
pseaac_vec = np.asarray(pseaac.transform(protein_seq))
feats.append(np.concatenate([kmer, pseaac_vec]))
Expand Down