Skip to content

Conversation

@ahmedselim2017
Copy link

Hi,

I was working on some Alphafold predictions and needed a tool to filter with B-factor and saw #163. So I have written a script called pdb_selb to filter atoms by their B-factor values.

However, as the signs < and > may interfere with shell redirect commands, I have added an option to select the operator that should be used instead of directly writing the operation and the threshold value as the same option.

Also, I have added tests and documentation for pdp_selb. But, as this is my first time contributing to this project please let me know if you have any feedback or improvement ideas!

@ahmedselim2017 ahmedselim2017 changed the title Pdb selb Added pdb_selb to filter by B-factor values Apr 9, 2024
Copy link
Member

@joaomcteixeira joaomcteixeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahmedselim2017 !

Thanks for your contribution. I like the way to presented the PR, everything follows the pdb-tools strategy and architecture. I left some comments I think you should address.

@amjjbonvin @JoaoRodrigues The new tool here follows the pdb-tools "one script one job" paradigm and is something we don't have yet. Personally, I like it. Once the comments are addressed I agree with merging it. It should be [FEATURE].

Cheers,

@amjjbonvin
Copy link
Member

PS: One more question/comment: the selection should act on a residue basis, meaning by that that full residues should be kept/removed and not only a few atoms per residue. Not an issue for pLDDT, but the B-factors are atom-specific.

@ahmedselim2017
Copy link
Author

I have implemented a new option called filtering_mode to select if the mean (used Python's statistics.fmean instead of sum()/len() for better precision), minimum, or maximum B-factor of a residue should be used to filter residues.

While testing, I noticed that the code fails if there are nonconsecutive records for the same residue in a PDB file. The current code assumes the records of a residue should be consecutive and groups consecutive records with the same chain and residue ID to filter it.

We could mitigate this issue by sorting the PDB file using pdb_sort before filtering. Or, as this would also sort already sorted files, we could keep a list of the already processed chain and residue IDs, and if a non-processed record has the same IDs, we could throw an error and then direct the user to sort their PDB file using pdb_sort.

@github-actions
Copy link

github-actions bot commented Oct 2, 2025

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Oct 2, 2025
@github-actions
Copy link

This PR was closed because it has been stalled for 15 days with no activity.

@github-actions github-actions bot closed this Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants