Skip to content

Commit 9de45b7

Browse files
jkbonfielddaviesrob
authored andcommitted
Further speed up of VCF parser (formats).
The if elseif checks are now a switch statement and juggled in order a bit. Also the fmt[j].max_m type code is now f->max_m with f incremented along with j. The impact on gcc builds is minor (maybe 1%), but for clang this was 8-9% speed improvement on a 1000 genome multi-sample VCF. Example timings for clang: Previous commit 23333.77 msec task-clock # 1.000 CPUs utilized 83667512727 cycles # 3.586 GHz 199145555089 instructions # 2.38 insn per cycle 43099743981 branches # 1847.097 M/sec 665687093 branch-misses # 1.54% of all branches This commit 75967289857 cycles # 3.585 GHz 195076309580 instructions # 2.57 insn per cycle 43265084488 branches # 2041.736 M/sec 640008186 branch-misses # 1.48% of all branches
1 parent b1acab6 commit 9de45b7

File tree

1 file changed

+27
-18
lines changed

1 file changed

+27
-18
lines changed

vcf.c

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2288,33 +2288,42 @@ static int vcf_parse_format(kstring_t *s, const bcf_hdr_t *h, bcf1_t *v, char *p
22882288

22892289
// collect fmt stats: max vector size, length, number of alleles
22902290
j = 0; // j-th format field
2291-
for (;;)
2292-
{
2293-
if ( *r == '\t' ) *r = 0;
2294-
if ( *r == ':' || !*r ) // end of field or end of sample
2295-
{
2296-
if (fmt[j].max_m < m) fmt[j].max_m = m;
2297-
if (fmt[j].max_l < l) fmt[j].max_l = l;
2298-
if (fmt[j].is_gt && fmt[j].max_g < g) fmt[j].max_g = g;
2291+
fmt_aux_t *f = fmt;
2292+
for (;;) {
2293+
switch (*r) {
2294+
case ',':
2295+
m++;
2296+
break;
2297+
2298+
case '|':
2299+
case '/':
2300+
if (f->is_gt) g++;
2301+
break;
2302+
2303+
case '\t':
2304+
*r = 0; // fall through
2305+
2306+
case '\0':
2307+
case ':':
2308+
if (f->max_m < m) f->max_m = m;
2309+
if (f->max_l < l) f->max_l = l;
2310+
if (f->is_gt && f->max_g < g) f->max_g = g;
22992311
l = 0, m = g = 1;
2300-
if ( *r==':' )
2301-
{
2302-
j++;
2303-
if ( j>=v->n_fmt )
2304-
{
2312+
if ( *r==':' ) {
2313+
j++; f++;
2314+
if ( j>=v->n_fmt ) {
23052315
hts_log_error("Incorrect number of FORMAT fields at %s:%"PRIhts_pos"",
2306-
h->id[BCF_DT_CTG][v->rid].key, v->pos+1);
2316+
h->id[BCF_DT_CTG][v->rid].key, v->pos+1);
23072317
v->errcode |= BCF_ERR_NCOLS;
23082318
return -1;
23092319
}
2310-
}
2311-
else break;
2320+
} else goto end_for;
2321+
break;
23122322
}
2313-
else if ( *r== ',' ) m++;
2314-
else if ( fmt[j].is_gt && (*r == '|' || *r == '/') ) g++;
23152323
if ( r>=end ) break;
23162324
r++; l++;
23172325
}
2326+
end_for:
23182327
v->n_sample++;
23192328
if ( v->n_sample == bcf_hdr_nsamples(h) ) break;
23202329
r++;

0 commit comments

Comments
 (0)