-
|
Does nlohmann::json support true incremental parsing where JSON tokens span arbitrary chunk boundaries, like when reading via zlib's gzread(64KB) and gzgets() → parse_chunk(buffer) → repeat? Use case: Parse gzip compressed big files like ~35MB .json.gz (by multiple chunked reads via gzread()), and ~25MB .csv.gz (via multiple gzgets() calls) memory-efficiently: //...
#include <zlib.h>
#include <nlohmann/json.hpp>
using json = nlohmann::json;
//....
json j;
gzFile f = gzopen("large.json.gz", "r");
char buf[64*1024];
while (gzread(f, buf, sizeof(buf)) > 0) {
j.feed_chunk(buf); // Resume parser state across mid-number/mid-string splits
}
gzclose(f);
//...Current >> operator and sax_parse() require complete input upfront. Discussion #3411 confirms streaming multiple complete objects, but what about single large JSON documents with tokens split across chunks (like RapidJSON's Reader::Parse())? I just tested this feature of the RapidJSON library and it works good, but for me it would be nice if also nlohmann::json had this very useful feature. (I of course don't mean reformatting the data into JSONL/NDJSON format, rather keeping the original JSON format + compressed.) Is this already possible via (undocumented) API, or on the TODO list? Thx. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
|
I think you would need to create an |
Beta Was this translation helpful? Give feedback.
-
|
I thought I solved it by using an old library named "gzstream" from the year 2001 (!): by creating an own class derived from std::istream, and passing that to nlohmann::json : BUT json gives parsing error: :-( igzstream ifs("file.gz");
json j;
#if 0
ifs >> j;
#else
j = json::parse(ifs);
#endif
std::cout << j.dump(2) << std::endl; |
Beta Was this translation helpful? Give feedback.
-
|
The last 3 postings above, incl. the useful hint by @gregmarr, do solve this case satisfactorily, so we can close this question. |
Beta Was this translation helpful? Give feedback.
The last 3 postings above, incl. the useful hint by @gregmarr, do solve this case satisfactorily, so we can close this question.
As said, with this solution one can leave all files gzip compressed, even json files, and still be able to process (parse & import) them as usual. This saves much disk space as most of such file.json.gz, file.csv.gz, file.txt.gz etc. have a high compression ratio.
While researching this problem I also learned that the boost library supports even more such compression algorithms (like bz2 etc.).