Streaming/chunked parsing support (gzread(), gzgets() across boundaries) #5076

mutluit · 2026-02-09T20:44:43Z

mutluit
Feb 9, 2026

Does nlohmann::json support true incremental parsing where JSON tokens span arbitrary chunk boundaries, like when reading via zlib's gzread(64KB) and gzgets() → parse_chunk(buffer) → repeat?

Use case: Parse gzip compressed big files like ~35MB .json.gz (by multiple chunked reads via gzread()), and ~25MB .csv.gz (via multiple gzgets() calls) memory-efficiently:

//...
#include <zlib.h>
#include <nlohmann/json.hpp>

using json = nlohmann::json;

//....
json j;
gzFile f = gzopen("large.json.gz", "r");
char buf[64*1024];
while (gzread(f, buf, sizeof(buf)) > 0) {
    j.feed_chunk(buf);  // Resume parser state across mid-number/mid-string splits
}
gzclose(f);
//...

Current >> operator and sax_parse() require complete input upfront. Discussion #3411 confirms streaming multiple complete objects, but what about single large JSON documents with tokens split across chunks (like RapidJSON's Reader::Parse())?

I just tested this feature of the RapidJSON library and it works good, but for me it would be nice if also nlohmann::json had this very useful feature.
Rationale is of course: keep the files compressed (like in my case daily about 2000 files of size each 35MB...).

(I of course don't mean reformatting the data into JSONL/NDJSON format, rather keeping the original JSON format + compressed.)

Is this already possible via (undocumented) API, or on the TODO list? Thx.

Answered by mutluit

Feb 13, 2026

The last 3 postings above, incl. the useful hint by @gregmarr, do solve this case satisfactorily, so we can close this question.
As said, with this solution one can leave all files gzip compressed, even json files, and still be able to process (parse & import) them as usual. This saves much disk space as most of such file.json.gz, file.csv.gz, file.txt.gz etc. have a high compression ratio.
While researching this problem I also learned that the boost library supports even more such compression algorithms (like bz2 etc.).

View full answer

gregmarr · 2026-02-10T00:18:23Z

gregmarr
Feb 10, 2026

I think you would need to create an istream that wraps it, and pass that istream to the parse() function. I thought maybe an input_stream_adapter would work, but that's just an overload set of functions, so you'd need to create a function in the detail namespace with the name input_stream_adapter.

4 replies

mutluit Feb 10, 2026
Author

I think you would need to create an istream that wraps it, and pass that istream to the parse() function. I thought maybe an input_stream_adapter would work, but that's just an overload set of functions, so you'd need to create a function in the detail namespace with the name input_stream_adapter.

When reading blockwise (like in the example above), after parsing the initial block of data, the parser of course must ask for subsequent input data. How does parse() request the next block after it processed a block? Is there a callback function to implement? And how is EOF/end_of_stream to signal to the parser?

gregmarr Feb 10, 2026

If you do it like this, it stops at the end of the complete object.

    istream istr(...);
    json j;
    istr >> j

gregmarr Feb 10, 2026

The parse() function requests a character at a time from the istream.

mutluit Feb 11, 2026
Author

The parse() function requests a character at a time from the istream.

Does the library allow also blockwise processing of the data, ie. w/o first loading all into memory?

mutluit · 2026-02-11T05:10:55Z

mutluit
Feb 11, 2026
Author

I thought I solved it by using an old library named "gzstream" from the year 2001 (!): by creating an own class derived from std::istream, and passing that to nlohmann::json :
https://www.cs.unc.edu/Research/compgeom/gzstream/

BUT json gives parsing error:
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_2::detail::parse_error'
what(): [json.exception.parse_error.101] parse error at line 1, column 5: syntax error while parsing value - invalid literal; last read: '<U+0009><U+0009> G'
Aborted

:-(
Is that maybe an error in nlohmann::json ? Like reading the (compressed) stream itself, instead of reading via the derived class (which does provide the uncompressed stream to json).

    igzstream ifs("file.gz");
    json j;
#if 0
    ifs >> j;
#else
    j = json::parse(ifs);
#endif
    std::cout << j.dump(2) << std::endl;

2 replies

gregmarr Feb 11, 2026

I would make sure that you're getting the expected text by reading from the stream and writing to cout.

mutluit Feb 11, 2026
Author

I would make sure that you're getting the expected text by reading from the stream and writing to cout.

You are right! The file.gz is a just a text file, not a json file! :-) I'll test again....
YES, it now works! :-)

$ ./my.exe test.json.gz 
{
  "a": [
    1,
    2,
    3,
    4
  ],
  "f": false,
  "hello": "world",
  "i": 123,
  "n": null,
  "pi": 3.1416,
  "t": true
}

mutluit · 2026-02-13T15:46:09Z

mutluit
Feb 13, 2026
Author

The last 3 postings above, incl. the useful hint by @gregmarr, do solve this case satisfactorily, so we can close this question.
As said, with this solution one can leave all files gzip compressed, even json files, and still be able to process (parse & import) them as usual. This saves much disk space as most of such file.json.gz, file.csv.gz, file.txt.gz etc. have a high compression ratio.
While researching this problem I also learned that the boost library supports even more such compression algorithms (like bz2 etc.).

0 replies

Uh oh!

Streaming/chunked parsing support (gzread(), gzgets() across boundaries) #5076

Uh oh!

Uh oh!

mutluit Feb 9, 2026

Replies: 3 comments · 6 replies

Uh oh!

gregmarr Feb 10, 2026

Uh oh!

Uh oh!

mutluit Feb 10, 2026 Author

Uh oh!

gregmarr Feb 10, 2026

Uh oh!

gregmarr Feb 10, 2026

Uh oh!

Uh oh!

mutluit Feb 11, 2026 Author

Uh oh!

Uh oh!

mutluit Feb 11, 2026 Author

Uh oh!

gregmarr Feb 11, 2026

Uh oh!

Uh oh!

mutluit Feb 11, 2026 Author

Uh oh!

mutluit Feb 13, 2026 Author

mutluit
Feb 9, 2026

Replies: 3 comments 6 replies

gregmarr
Feb 10, 2026

mutluit Feb 10, 2026
Author

mutluit Feb 11, 2026
Author

mutluit
Feb 11, 2026
Author

mutluit Feb 11, 2026
Author

mutluit
Feb 13, 2026
Author