You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
print("Node 7's parent is", parent_of_7, "and childen are", children_of_7, "in the first tree")
241
235
```
242
236
243
-
244
-
245
-
246
237
(sec_terminology_individuals_and_populations)=
247
238
248
239
### Individuals and populations
@@ -332,6 +323,8 @@ homozygous for "T", Bob is homozygous for "G", and Cat is heterozygous "T/G".
332
323
In other words the ancestral state and the details of any mutations at that site,
333
324
when coupled with the tree topology at the site {attr}`~Site.position`, is sufficient to
334
325
define the allelic state possessed by each sample.
326
+
See description for {attr}`~Mutation.parent` on how tskit handles multiple mutations along
327
+
a path in a tree.
335
328
336
329
Note that even though the genome is 1000 base pairs long, the tree sequence only contains
337
330
a single site, because we usually only bother defining *variable* sites in a tree
@@ -340,7 +333,6 @@ that genomic location). It is perfectly possible to have a site with no mutation
340
333
(or silent mutations) --- i.e. a "monomorphic" site --- but such sites are not normally
341
334
used in further analysis.
342
335
343
-
344
336
(sec_terminology_provenance)=
345
337
346
338
### Provenance
@@ -354,7 +346,6 @@ call to msprime that produced it, and the second the call to
354
346
provenance entries are sufficient to exactly recreate the tree sequence, but this
355
347
is not always possible.
356
348
357
-
358
349
(sec_concepts)=
359
350
360
351
## Concepts
@@ -385,17 +376,18 @@ with 3 or more children in a particular tree (these are known as *polytomies*).
385
376
### Tree changes, ancestral recombinations, and SPRs
386
377
387
378
The process of recombination usually results in trees along a genome where adjacent
388
-
trees differ by only a few "tree edit" or SPR (subtree-prune-and-regraft) operations.
379
+
trees differ by only a few "tree edit" or subtree-prune-and-regraft (SPR) operations.
389
380
The result is a tree sequence in which very few edges
390
381
{ref}`change from tree to tree<fig_what_is_edge_diffs>`.
391
382
This is the underlying reason that `tskit` is so
392
383
efficient, and is well illustrated in the example tree sequence above.
393
384
394
385
In this (simulated) tree sequence, each tree differs from the next by a single SPR.
395
-
The subtree defined by node 7 in the first tree has been pruned and regrafted onto the
396
-
branch between 0 and 10, to create the second tree. The second and third trees have the
397
-
same topology, but differ because their ultimate coalesence happened in a different
398
-
ancestor (easy to spot in a simulation, but hard to detect in real data). This is also
386
+
The subtree defined by node 7 in the first tree has been pruned (away from node 11) and
387
+
regrafted onto the branch between 0 and 9, to create the second tree.
388
+
The second and third trees have the same topology,
389
+
but differ because their ultimate coalesence happened in a different ancestor
390
+
(easy to spot in a simulation, but hard to detect in real data). This is also
399
391
caused by a single SPR: looking at the second tree, either the subtree below node 8 or
400
392
the subtree below node 9 must have been pruned and regrafted higher up on the same
401
393
lineage to create the third tree. Because this is a fully {ref}`simplified<sec_simplification>`
@@ -409,7 +401,7 @@ positions (an "infinite sites" model of breakpoints), then the number of trees i
409
401
sequence equals the number of ancestral recombination events plus one. If recombinations
410
402
can occur at the same physical position (e.g. if the genome is treated as a set of
411
403
discrete integer positions, as in the simulation that created this tree sequence) then
412
-
moving from one tree to the next in a tree sequence might require multiple SPRs if
404
+
moving from one tree to the next in a tree sequence might require multiple SPRs if
413
405
there are multiple, overlaid ancestral recombination events.
414
406
415
407
(sec_concepts_args)=
@@ -418,11 +410,11 @@ there are multiple, overlaid ancestral recombination events.
418
410
419
411
::::{margin}
420
412
:::{note}
421
-
There is a subtle distinction between common ancestry and coalescence. In particular, all coalescent nodes are common ancestor events, but not all common ancestor events in an ARG result in coalescence in a local tree.
413
+
There is a subtle distinction between common ancestry and coalescence. In particular, all coalescent nodes are common ancestor events, but not all common ancestor events in an ARG result in coalescence in all local trees.
422
414
:::
423
415
::::
424
416
425
-
The term "Ancestral Recombination Graph", or ARG, is commonly used to describe a genetic
417
+
The term Ancestral Recombination Graph (ARG), is commonly used to describe a genetic
426
418
genealogy. In particular, many (but not all) authors use it to mean a genetic
427
419
genealogy in which details of the position and potentially the timing of all
428
420
recombination and common ancestor events are explictly stored. For clarity
@@ -438,7 +430,7 @@ which omits these extra nodes. This is for two main reasons:
438
430
2. The number of recombination and non-coalescing common ancestor events in the genealogy
439
431
quickly grows to dominate the total number of nodes in the tree sequence,
440
432
without actually contributing to the mutations inherited by the samples.
441
-
In other words, these nodes are redundant to the storing of genome data.
433
+
In other words, these nodes are redundant to the storing of genomic data.
442
434
443
435
Therefore, compared to a full ARG, you can think of a simplified tree sequence as
444
436
storing the trees *created by* recombination events, rather than attempting to record the
@@ -450,6 +442,8 @@ way to put it:
450
442
> whereas a [simplified] tree sequence encodes the outcome of those events"
0 commit comments