Skip to content

massifrg/pandoc-extract-json-meta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pandoc-extract-json-meta

A custom writer for Pandoc to extract documents' metadata as JSON.

@jgm showed a simpler way to achieve the same goal here.

Unless you absolutely need the automatic conversion of numbers, just follow that simpler way.

Testing this project

To test it, change dir to ./test and type:

pandoc -f json -t ../src/json_metadata.lua test.json

and you'll get this output (here it's prettified):

{
  "author": [
    "Author One",
    "Author Two"
  ],
  "average": "4.2",
  "flags": {
    "checked": true,
    "published": false
  },
  "meta1": "A string value",
  "meta2": "Inlines with an italic",
  "revision": "3",
  "title": "A document with metadata\n\n(for tests only)\n"
}

(just look at ./test.native or test.md if you want to see the contents of the test document)

Keeping the styles of MetaInlines and MetaBlocks

MetaInlines and MetaBlocks metadata can be formatted.

The default behavior of json_metadata.lua is to convert them to plain text.

You may want to keep their formatting. You can do it setting the format variable:

pandoc -f json -t ../src/json_metadata.lua -V format=html test.json

to get this:

{
  "author": [
    "Author One",
    "Author Two"
  ],
  "average": "4.2",
  "flags": {
    "checked": true,
    "published": false
  },
  "meta1": "A string value",
  "meta2": "Inlines with an <em>italic</em>",
  "revision": "3",
  "title": "<p>A document with metadata</p>\n<p>(<em>for tests only</em>)</p>"
}

Currently, the script allows only plain (default), markdown, html and native formats, but you can add any other format supported by pandoc.write changing the ALLOWED_FORMATS table in the first lines of the script.

Detecting numbers

There's no MetaValue that represents numbers, so metadata with numeric values would be represented by a MetaString or a MetaInlines.

Setting the numbers variable, you can detect integers or numbers with decimals:

pandoc -f json -t ../src/json_metadata.lua -V format=plain -V numbers=true test.json

results in:

{
  "author": [
    "Author One",
    "Author Two"
  ],
  "average": 4.2,
  "flags": {
    "checked": true,
    "published": false
  },
  "meta1": "A string value",
  "meta2": "Inlines with an italic",
  "revision": 3,
  "title": "A document with metadata\n\n(for tests only)\n"
}

As you can see, average (MetaString) and revision (MetaInlines) fields are numbers in the resulting JSON.

The number detection is activated by any value of the numbers variable, except for false and 0.

About

A custom writer for Pandoc to extract documents' metadata as JSON

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages