-
Notifications
You must be signed in to change notification settings - Fork 49
New dseq manu bjorn rebased #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…mplement_table, _complement_table, to_watson_table, to_crick_table, to_N, to_5tail_table, to_3tail_table, to_full_sequence, bp_dict
2. new Dseq.__init__ w same arguments as before, but data is now stored in Bio.Seq.Seq._data 3. altered Dseq.quick classmethod 4. watson, crick and ovhg are methods decorated with @Property 5. New method to_blunt_string with returns a the string of the watson strand if the underlying Dseq object was blunt. 6. Old __getitem__ replaced 7. New __repr__ method 8. new looped method 9. new __add__ method
… imports at the top. Some tests involved strands that did not anneal prefectly, these have been corrected.
…ytestrings 2. user method that removes U and leaves an empty site. 3. cast_to_ds_right, cast_to_ds_left methods, these are *not* fill_in methods as they do not rely on a polymerase. 4. New melt method, useful for USER cloning etc.. 5. reimplemented apply_cut method
… utils. This should fix U in primers
…XME indicating a large change in behaviour.
…e x and y has meaning in the new Dseq implementation. (line 1074) 2. The expected result in test_pcr_assembly_uracil should be AUUAggccggTTOO. 3. Removed numbers at start and end of some sequenses. This could be discussed. 4. Four instances of FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.
…he check for internal splits in init
fuction dsbreaks is called from pydna.alphabet in __init__ simplified code overall, fuction get_parts from pydna.alphabet used in several places simpler looped method using get_parts and __add__ improved error message from __add__
This will take care of bytes.find over the origin. Leads to simpler Dseq.find implementation. Added a __bytes__ dunder method to Dseq. Previously, bytes(Dseq) called __bytes__ on the Bio.Seq.Seq returning basically the ._data property. This is different from str(Dseq) that returns the watson strand of an imaginary blunt version of the Dseq. With this, bytes and str returns the same thing ecxept for encoding. Explain the added "79.0" in mw method. Added that Dseq.find is case sensitive. I think this is the least surprising functionality. Debatable. simplified the docstring for Dseq.find method removed commented out code reimplemented __contains__ in terms of Dseq.find Highlighted that __eq__ is case insensitive. Debatable. Added docstring to shifted method simplified the looped by testing if self+self gives a TypeError. This puts the logics in the __add__
Changed assert str(bbfeat).upper() in bbseq --> assert str(bbfeat).upper() in str(bbseq).upper() changed since Dseq.__contains__ is now case sensitive since it now relies on Dseq.find
More intiutive implementation of Dseq.__add__ Renamed _fill_in_five_prime and _fill_in_three_prime to the less confusing _fill_in_left and _fill_in_right to better reflect what the methods do. Simpler implementation of mung method. Deleted out obsolete cas9 method. watson_ovhg renamed left_ovhg, watson_ovhg remains an alias Renames _fill_in_five_prime --> _fill_in_left _fill_in_three_prime --> _fill_in_right __mul__ implemented in terms of .join nibble_to_blunt New methods used in new Assembly2 code nibble_five_prime_left nibble_five_prime_right nibble_three_prime_left nibble_three_prime_right
964 Changed how the slice is expressed. This solves an edge case when Sticky3 is zero. 1624 1688 1752 1804 calculate overhang in a different way. This solves removing nucleotides from sequences that already has overhang. 2503 apply_cut_new method 2730 apply_cut
|
@cursor review this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a major refactoring of pydna's DNA sequence representation system, implementing a new "dsIUPAC" alphabet (dscode) to better handle double-stranded DNA with overhangs, single-stranded regions, and USER enzyme treatment. The changes enable new molecular cloning techniques like USER cloning while maintaining backward compatibility.
Key changes:
- New alphabet system with dscode symbols representing base pairs and single-stranded regions
- Refactored Dseq class with improved internal representation and new methods for DNA manipulation
- Enhanced support for sticky ends, melting, and enzymatic treatments (USER, T4, mung bean nuclease)
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 27 comments.
Show a summary per file
| File | Description |
|---|---|
| src/pydna/alphabet.py | New module defining dscode alphabet with base pair dictionaries and translation tables |
| src/pydna/dseq.py | Major refactoring of Dseq class with new internal representation and manipulation methods |
| src/pydna/utils.py | Added anneal_from_left function and updated complement logic |
| src/pydna/assembly2.py | Updated assembly logic to use new Dseq methods (cast_to_ds_, exo1_) |
| src/pydna/amplify.py | Improved primer annealing detection using new alphabet system |
| src/pydna/dseqrecord.py | Updated looped() method to handle features properly with sticky ends |
| tests/test_new.py | New test file for dscode representations |
| tests/test_USERcloning.py | Complete rewrite for USER enzyme cloning |
| tests/test_module_dseq.py | Extensive test updates for new Dseq behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def user_assembly_design(): | ||
| """ | ||
| Work in progress |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is marked as 'Work in progress' in its docstring. If it's not ready for production, it should either be completed, removed, or marked with a decorator like @pytest.mark.skip to prevent accidental use.
| assembly_plan = [ | ||
| (1, 2, loc_end, loc_start), | ||
| ] | ||
| # FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result. |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple FIXME comments indicate that assertions are failing sanity checks but producing expected results. This suggests either the sanity checks need adjustment or there's an underlying issue that should be resolved before merging.
| data.append(basepair_dict[w, c]) | ||
| except KeyError as err: | ||
| print(f"Base mismatch in representation {err}") | ||
| raise ValueError() |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'Base mismatch in representation' is printed but then a ValueError is raised with no message. The ValueError should include a descriptive message, such as f'Base mismatch in representation: {err}' to help debugging.
| raise ValueError() | |
| raise ValueError(f"Base mismatch in representation: {err}") |
| assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error | ||
|
|
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment suggests this test case should probably raise an error but currently passes. This indicates incomplete validation logic that should be addressed.
| assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error | |
| with pytest.raises(ValueError): | |
| Dseq.from_representation( | |
| """ | |
| Dseq(-6) | |
| G A C | |
| C T G | |
| """) |
| # if seq.ovhg < 0: | ||
| # new_crick = new_crick + reverse_complement(seq.watson[: -seq.ovhg]) | ||
| # # Crick 5' overhang | ||
| # elif seq.ovhg > 0: | ||
| # new_watson = reverse_complement(seq.crick[-seq.ovhg :]) + new_watson | ||
| # if _Dseq(new_watson, new_crick, 0) != seq.cast_to_ds_left(): |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
|
|
||
| import pytest | ||
| import IPython | ||
| import sys |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'sys' is not used.
| import sys |
| import pytest | ||
| import IPython | ||
| import sys | ||
| import copy |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'copy' is not used.
| import copy |
| import sys | ||
| import copy | ||
| import warnings | ||
| import glob |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'glob' is not used.
| import glob |
| from Bio.Restriction import ( | ||
| Acc65I, | ||
| ApaI, | ||
| BamHI, | ||
| BglII, | ||
| BsaI, | ||
| Bsp120I, | ||
| Bsu36I, | ||
| BstAPI, | ||
| EcoRI, | ||
| EcoRV, | ||
| KpnI, | ||
| MaeII, | ||
| NlaIV, | ||
| NdeI, | ||
| NotI, | ||
| PacI, | ||
| PstI, | ||
| SmaI, | ||
| RestrictionBatch, | ||
| ) |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'MaeII' is not used.
Import of 'PacI' is not used.
| #!/usr/bin/env python3 | ||
| # -*- coding: utf-8 -*- | ||
|
|
||
| import pytest |
Copilot
AI
Nov 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
| import pytest |
|
Hi @BjornFJohansson, below my review of the PR with the things that I think should be fixed before merging. I have also added some changes myself in a PR to this branch (#489). Feel free to cherry-pick the ones you agree with or just merge it.
|
Same as #483, but rebased. Please continue developing on this one @BjornFJohansson
Hi @BjornFJohansson look at the last commit where I fixed the
loopedfunction, now it passes the tests. I created the draft PR so we can discuss here.I like the changes to assembly2, I think they make things clearer, and the overriding of the PCR assembly function makes a lot of sense.
I wonder if this bit from assembly2 could be turned into a function (strands_anneal or something), or some way to test for reverse-complementarily: