-
Notifications
You must be signed in to change notification settings - Fork 726
Closed
Milestone
Description
ngram_model_read is used to read ngram data from a file and supports several different file types.
Users can either specify the type directly, or use NGRAM_AUTO to try to auto detect.
In all branches except for specifying NGRAM_ARPA, the resulting model is checked to not be null (otherwise ngram_model_read returns null).
As such, when using NGRAM_ARPA with a malformed file, a crash can occur later in ngram_model_read inside ngram_model_apply_weights.
Tested on f6e44c6e
(found via automated fuzzing)
The following testcase demonstrates the issue:
testcase.cpp
#include <cstdio>
extern "C" {
#include "/fuzz/install/include/pocketsphinx.h"
}
int main() {
// Create an empty (invalid) language-model file
const char* path = "/tmp/empty.lm";
if (FILE* f = fopen(path, "wb")) fclose(f);
// Minimal config and logmath
ps_config_t* cfg = ps_config_init(NULL);
if (!cfg) return 0;
logmath_t* lmath = logmath_init(1.0001, 0, 1);
if (!lmath) return 0;
// Forcing ARPA type on an invalid file triggers the crash inside ngram_model_read
// Expected behavior: return NULL. Actual behavior: NULL deref inside apply_weights
ngram_model_t* m = ngram_model_read(cfg, path, NGRAM_ARPA, lmath);
(void)m; // crash happens inside the call above
return 0;
}crash report
==12==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000058 (pc 0x55a0a3690b78 bp 0x000000000004 sp 0x7ffeac2d49d0 T0)
==12==The signal is caused by a READ memory access.
==12==Hint: address points to the zero page.
#0 0x55a0a3690b78 in ngram_model_apply_weights /fuzz/src/src/lm/ngram_model.c:367:21
#1 0x55a0a3690b78 in ngram_model_read /fuzz/src/src/lm/ngram_model.c:176:9
#2 0x55a0a369053c in main /fuzz/workspace/test.cpp:18:24
#3 0x7f997ccfad8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#4 0x7f997ccfae3f in __libc_start_main csu/../csu/libc-start.c:392:3
#5 0x55a0a35b53b4 in _start (/fuzz/workspace/test+0x383b4) (BuildId: 482c4ce1513550ce1fba6ec53ff4e8b84fef4d93)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /fuzz/src/src/lm/ngram_model.c:367:21 in ngram_model_apply_weights
Metadata
Metadata
Assignees
Labels
No labels