Joint probabilities for pair emission?

### The basic situation

We have a per-site mutation frequency, `f` (the fraction of observed sequences that have a mutation at the site), and we want to fill in the 4x4 table of pair emission probabilities in the HMM.
#### Independent emissions

The simplest way to do this is to make the emission probabilities a product of two factors (one for each sequence), where each factor is `f/3` (if the sequence is not germline) and `1-f` (if the sequence is germline). 
- virtues: simple, and it appears empirically to be approximately correct, i.e. bayes factors of related (unrelated) sequences are greater (less) than zero.
- but: ignores the fact that related sequences have correlations in their mutations
- looks like this (symmetric entries omitted for clarity)

|  | germline | mutated | mutated | mutated |
| --- | --- | --- | --- | --- |
| germline | (1-f)(1-f) | f(1-f)/3 | f(1-f)/3 | f(1-f)/3 |
| mutated |  | f*f / 9 | f*f / 9 | f*f / 9 |
| mutated |  |  | f*f / 9 | f*f / 9 |
| mutated |  |  |  | f*f / 9 |
#### Joint emissions

So all we need to implement joint emission is fill in the entries in the matrix so they take into account that if the two sequences are mutated to the same base, they're more likely to be clonally related. Except I haven't worked out a good way to do this. All the things I've tried require assumptions about branch lengths and tree topology which are not always true, so empirically they end up not being that great.

Erick and I talked about this a few months ago. If memory serves we got as far as he was actually convinced it was non-trivial, but didn't work out how to do it.

This is quite related to #8.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint probabilities for pair emission? #29

The basic situation

Independent emissions

Joint emissions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	germline	mutated	mutated	mutated
germline	(1-f)(1-f)	f(1-f)/3	f(1-f)/3	f(1-f)/3
mutated		f*f / 9	f*f / 9	f*f / 9
mutated			f*f / 9	f*f / 9
mutated				f*f / 9

Joint probabilities for pair emission? #29

Description

The basic situation

Independent emissions

Joint emissions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions