A custom writer for Pandoc to extract documents' metadata as JSON.
@jgm showed a simpler way to achieve the same goal here.
Unless you absolutely need the automatic conversion of numbers, just follow that simpler way.
To test it, change dir to ./test and type:
pandoc -f json -t ../src/json_metadata.lua test.jsonand you'll get this output (here it's prettified):
{
"author": [
"Author One",
"Author Two"
],
"average": "4.2",
"flags": {
"checked": true,
"published": false
},
"meta1": "A string value",
"meta2": "Inlines with an italic",
"revision": "3",
"title": "A document with metadata\n\n(for tests only)\n"
}(just look at ./test.native or test.md if you want to see the contents of the test document)
MetaInlines and MetaBlocks metadata can be formatted.
The default behavior of json_metadata.lua is to convert them to plain text.
You may want to keep their formatting. You can do it setting the format variable:
pandoc -f json -t ../src/json_metadata.lua -V format=html test.jsonto get this:
{
"author": [
"Author One",
"Author Two"
],
"average": "4.2",
"flags": {
"checked": true,
"published": false
},
"meta1": "A string value",
"meta2": "Inlines with an <em>italic</em>",
"revision": "3",
"title": "<p>A document with metadata</p>\n<p>(<em>for tests only</em>)</p>"
}Currently, the script allows only plain (default), markdown, html and native formats,
but you can add any other format supported
by pandoc.write
changing the ALLOWED_FORMATS table in the first lines of the script.
There's no MetaValue that represents numbers, so metadata with numeric values would
be represented by a MetaString or a MetaInlines.
Setting the numbers variable, you can detect integers or numbers with decimals:
pandoc -f json -t ../src/json_metadata.lua -V format=plain -V numbers=true test.jsonresults in:
{
"author": [
"Author One",
"Author Two"
],
"average": 4.2,
"flags": {
"checked": true,
"published": false
},
"meta1": "A string value",
"meta2": "Inlines with an italic",
"revision": 3,
"title": "A document with metadata\n\n(for tests only)\n"
}As you can see, average (MetaString) and revision (MetaInlines) fields are numbers
in the resulting JSON.
The number detection is activated by any value of the numbers variable,
except for false and 0.