Skip to content

Missing null check in ngram_model_read for NGRAM_ARPA #432

@hgarrereyn

Description

@hgarrereyn

ngram_model_read is used to read ngram data from a file and supports several different file types.

Users can either specify the type directly, or use NGRAM_AUTO to try to auto detect.

In all branches except for specifying NGRAM_ARPA, the resulting model is checked to not be null (otherwise ngram_model_read returns null).

As such, when using NGRAM_ARPA with a malformed file, a crash can occur later in ngram_model_read inside ngram_model_apply_weights.

Tested on f6e44c6e

(found via automated fuzzing)

The following testcase demonstrates the issue:
testcase.cpp

#include <cstdio>
extern "C" {
#include "/fuzz/install/include/pocketsphinx.h"
}
int main() {
    // Create an empty (invalid) language-model file
    const char* path = "/tmp/empty.lm";
    if (FILE* f = fopen(path, "wb")) fclose(f);

    // Minimal config and logmath
    ps_config_t* cfg = ps_config_init(NULL);
    if (!cfg) return 0;
    logmath_t* lmath = logmath_init(1.0001, 0, 1);
    if (!lmath) return 0;

    // Forcing ARPA type on an invalid file triggers the crash inside ngram_model_read
    // Expected behavior: return NULL. Actual behavior: NULL deref inside apply_weights
    ngram_model_t* m = ngram_model_read(cfg, path, NGRAM_ARPA, lmath);
    (void)m; // crash happens inside the call above
    return 0;
}

crash report

==12==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000058 (pc 0x55a0a3690b78 bp 0x000000000004 sp 0x7ffeac2d49d0 T0)
==12==The signal is caused by a READ memory access.
==12==Hint: address points to the zero page.
    #0 0x55a0a3690b78 in ngram_model_apply_weights /fuzz/src/src/lm/ngram_model.c:367:21
    #1 0x55a0a3690b78 in ngram_model_read /fuzz/src/src/lm/ngram_model.c:176:9
    #2 0x55a0a369053c in main /fuzz/workspace/test.cpp:18:24
    #3 0x7f997ccfad8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #4 0x7f997ccfae3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #5 0x55a0a35b53b4 in _start (/fuzz/workspace/test+0x383b4) (BuildId: 482c4ce1513550ce1fba6ec53ff4e8b84fef4d93)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /fuzz/src/src/lm/ngram_model.c:367:21 in ngram_model_apply_weights

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions