Improve `pdb_merge` to include `TER`s and `END` lines #150

joaomcteixeira · 2022-11-04T12:54:24Z

Closes #149

amjjbonvin

But are END lines first filtered out? There should not be multiple END statements

joaomcteixeira · 2022-11-08T08:00:15Z

That's right. However, I feel we are adding too much to the purpose of merge. All that should be done by tidy. Maybe if we exchange the -strict option for tidy we don't need to touch the merge. What do you think? Because the merge was really designed to be a concatenator

amjjbonvin · 2022-11-08T08:05:25Z

Yes - but why does it remove the END statements then? Other code don’t do that (e.g. pdb_chain).

joaomcteixeira · 2022-11-08T08:38:08Z

The current version of pdb_merge does not remove any lines from the input files. It's a straight concatenation.

pdb-tools/pdbtools/pdb_merge.py

Lines 84 to 87 in 2a070bb

    
           for fhandle in flist: 
        
               for line in fhandle: 
        
                   yield line 
        
               fhandle.close()

amjjbonvin · 2022-11-08T10:40:35Z

So a simple cat command effectively. Meaning pdb_tidy should always be run to correct things. A very different behaviour than `pdb_mkensemble` for example.

joaomcteixeira · 2022-11-08T11:35:05Z

Yes. mkensemble has the clear purpose of making a correct ensemble of structures. While merge assumes the user knows what he/she is doing when concatenating the files. Likely the user cleaned the PDBs before using merge.

amjjbonvin · 2022-11-08T11:36:49Z

hmmm…. assumptions… this is asking for troubles… It is this make clear to the users?

…

Yes. mkensemble has the clear purpose of making a correct ensemble of structures. While merge assumes the user knows what he/she is doing when concatenating the files. Likely the user cleaned the PDBs before using merge.

joaomcteixeira · 2022-11-24T10:18:49Z

Good point. Let's first clarify that to users before changing the behavior of pdb_merge.

joaomcteixeira · 2023-04-03T10:18:29Z

I was reviewing this now, and I still agree we should not touch pdb_merge for this purpose and pipe the results to pdb_tidy as the logic is not that straightforward. It is stated in the docs.

The contents are not sorted and no lines are deleted (e.g. END, TER
statements) so we recommend piping the results through `pdb_tidy.py`.

amjjbonvin · 2023-04-03T10:20:23Z

How will you explain users that pdb_merge does require pdb_tidy to make a correct PDB while pdb_mkensemble not...

joaomcteixeira · 2023-04-03T22:13:26Z

You are right. I am addressing this. It's not as straightforward as adding an END because also TER was not accounted for. If TERs exist in the output is because they were already present in the input. I am working on it.

joaomcteixeira · 2023-04-06T08:48:43Z

When doing pdb_merge, should atom numbers be renumbered starting from 1? This would change the original atom numbers but avoid repeated numbers.

amjjbonvin · 2023-04-06T09:20:03Z

Probably good indeed

joaomcteixeira · 2023-04-11T14:28:55Z

Hi @amjjbonvin

I have addressed this issue, plus some other details. Now, pdb_merge operates as follows:

The merged PDB file will represent a single MODEL.
Comment lines in input PDBs will be ignored (REMARK, ...)
Atom numbers are restarted from 1.
CONECT lines are yield at the end. CONECT numbers are updated to
the new atom numbers.
Missing TER and END statements are placed accordingly. Original
TER and END statements are maintained.

Inside tests/data/ there are three PDBs, dummy_merge_A/B/C.pdb that you can use to test. Test also with others you may know, and let me know.

github-actions · 2025-10-03T02:05:40Z

This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 15 days.

add END line to the end of merge

df967d6

joaomcteixeira requested review from JoaoRodrigues and amjjbonvin November 4, 2022 12:54

joaomcteixeira self-assigned this Nov 4, 2022

amjjbonvin approved these changes Nov 4, 2022

View reviewed changes

joaomcteixeira marked this pull request as draft April 3, 2023 22:13

joaomcteixeira changed the title ~~add END line as final line in pdb_merge~~ Improve pdb_merge to include TERs and END lines Apr 3, 2023

joaomcteixeira added 3 commits April 4, 2023 00:18

some draft

0ede978

add test pdbs

6661ed4

new draft

822ce2b

almost finalized draft

4e636c6

release candidate

82ef8ce

joaomcteixeira marked this pull request as ready for review April 11, 2023 14:14

joaomcteixeira added 3 commits April 11, 2023 16:14

Merge branch 'master' into i149

67e5c8c

lint

0a7db3d

compatibility with 2.7

6056b50

address OSes linesep

da69f86

joaomcteixeira requested a review from amjjbonvin April 11, 2023 14:35

joaomcteixeira added bug enhancement labels Apr 11, 2023

github-actions bot added the Stale label Oct 3, 2025

Improve pdb_merge to include TERs and END lines #150

Are you sure you want to change the base?

Improve pdb_merge to include TERs and END lines #150

Uh oh!

Conversation

joaomcteixeira commented Nov 4, 2022

Uh oh!

amjjbonvin left a comment

Choose a reason for hiding this comment

Uh oh!

joaomcteixeira commented Nov 8, 2022

Uh oh!

amjjbonvin commented Nov 8, 2022 via email

Uh oh!

joaomcteixeira commented Nov 8, 2022

Uh oh!

amjjbonvin commented Nov 8, 2022 via email

Uh oh!

joaomcteixeira commented Nov 8, 2022

Uh oh!

amjjbonvin commented Nov 8, 2022 via email

Uh oh!

joaomcteixeira commented Nov 24, 2022

Uh oh!

joaomcteixeira commented Apr 3, 2023

Uh oh!

amjjbonvin commented Apr 3, 2023

Uh oh!

joaomcteixeira commented Apr 3, 2023

Uh oh!

joaomcteixeira commented Apr 6, 2023

Uh oh!

amjjbonvin commented Apr 6, 2023 via email

Uh oh!

joaomcteixeira commented Apr 11, 2023

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve `pdb_merge` to include `TER`s and `END` lines #150

Improve `pdb_merge` to include `TER`s and `END` lines #150