Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Enumerations

It is _*highly*_ recommended that DFDL schema authors avoid whitespace within the definition of
symbolic enumeration constants. Underscores should be use instead of spaces.

## Enumerations and Validity

The above example also illustrates an optional technique which segregates the enumeration values
into some that are schema-valid, and some that are not, based on an additional pattern facet which
constrains the valid symbolic values to names that begin with "OK_"[^checkConstraints].

[^checkConstraints] This particular MIL-STD-2045 schema
(as of this writing 2025-10-27) enforces the pattern facet at parse time, so it
is a parse error if the selected symbolic enumeration does not start with `"OK_"` meaning
only numeric values 2, 4, 5, 6, and 7 are considered well-formed. However,
this enforcement is not required and is not actually considered best practice.

[TBD]:
Bug? In MIL-STD-2045, this string type has a dfdl:checkConstraints(.) assert on it. So these
facets are
enforced and cause a parse error if data does not adhere to them.
This is probably a mistake in
the schema, which should at least have a DFDL variable to control whether or not this
dfdl:assert will fail or not.
Best practice is to NOT use dfdl:checkConstraints(.), so as to
cleanly separate the concepts of well-formed and valid data.
8 changes: 8 additions & 0 deletions site/_data/footer.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
org_name: "The Apache Software Foundation"
org_url: "https://www.apache.org"
license_name: "Apache License, Version 2.0"
copyright_year: "2025"
license_url: "https://www.apache.org/licenses/LICENSE-2.0"
trademark_text: "Apache, Apache Daffodil, Daffodil, and the Apache Daffodil logo are trademarks of The Apache Software Foundation."
daffodil_logo: ../assets/themes/apache/img/apache-daffodil-logo.png
asf_logo: ../assets/themes/apache/img/asf_logo_wide.png
4 changes: 4 additions & 0 deletions site/_includes/themes/apache/default.html
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
{% include themes/apache/_navigation.html %}

<div class="container">
{% if page.pdf %}
{% assign filename_without_extension = page.path | split: '.' | first %}
<div><i>This page is available as a <a href="../pdf/{{ filename_without_extension }}.pdf" target="_blank">downloadable PDF</a></i></div>
{% endif %}
{{ content }}
<footer>
{% include themes/apache/footer.html %}
Expand Down
18 changes: 8 additions & 10 deletions site/_includes/themes/apache/footer.html
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
<footer class="site-footer">
<div class="wrapper">
<div class="footer-col-wrapper" style="font-size: .85em;">
<div class="footer-col-wrapper" style="font-size:.85em;">
<hr>
<div>
<div style="text-align: center;">
Copyright &copy; 2025 <a href="https://www.apache.org">The Apache Software Foundation</a>.
Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.
<br>
Apache, Apache Daffodil, Daffodil, and the Apache Daffodil logo
are trademarks of The Apache Software Foundation.
</div>
<div style="text-align:center;">
Copyright &copy; {{ site.data.footer.copyright_year }}
<a href="{{ site.data.footer.org_url }}">{{ site.data.footer.org_name }}</a>.
Licensed under the
<a href="{{ site.data.footer.license_url }}">{{ site.data.footer.license_name }}</a>.
<br>
{{ site.data.footer.trademark_text }}
</div>
</div>
</div>
Expand Down
56 changes: 56 additions & 0 deletions site/_pandoc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# ==========================================================
# Pandoc PDF generator for Jekyll site
# Scans Markdown files with "pdf: true" in YAML front matter
# and produces PDFs in the site's ./pdf/ directory
# ==========================================================
SHELL := /bin/bash

# --- Configuration ---
SITE_ROOT := ..

PANDOC := pandoc

# Output directory for generated PDFs (at site root)
PDF_OUTDIR := $(SITE_ROOT)/pdf

DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml

# This is coarse, but we take any markdown file containing a line of "pdf: true" regardless of location in the file.
# but we exclude directories that begin with "_".
MD_CANDIDATES=$(shell grep -Rlx 'pdf: true' $(SITE_ROOT) --exclude-dir '_*')

# --- Files to build ---
PDF_SRCS := $(MD_CANDIDATES)
PDFS := $(patsubst $(SITE_ROOT)/%.md,$(PDF_OUTDIR)/%.pdf,$(PDF_SRCS))

# --- Default target ---
all: $(PDFS)
@echo "Generated $(words $(PDFS)) PDF(s) in $(PDF_OUTDIR)"

# --- Rule to ensure output dirs exist and build each PDF ---
$(PDF_OUTDIR)/%.pdf: $(SITE_ROOT)/%.md | make-dirs
@mkdir -p $(dir $@)
@echo "📄 Building $@"
$(PANDOC) --defaults=$(DEFAULTS) \
-L only.lua \
-f markdown+markdown_in_html_blocks \
--metadata-file=$(SITE_ROOT)/_data/footer.yml \
-V numberoffset=0 \
--from markdown --to pdf -o $@

# --- Create directories as needed ---
.PHONY: make-dirs
make-dirs:
@mkdir -p $(PDF_OUTDIR)

# --- Cleanup ---
.PHONY: clean
clean:
@echo "Removing generated PDFs"
@rm -rf $(PDF_OUTDIR)

# --- Debugging helpers ---
.PHONY: list
list:
@echo "Markdown files with pdf:true:"
@for f in $(PDF_SRCS); do echo " - $$f"; done
133 changes: 133 additions & 0 deletions site/_pandoc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
layout: page
title: Pandoc + Jekyll Integration
pdf: false
---
# Pandoc + Jekyll Integration

This directory contains tools for generating **PDF versions** of selected Jekyll pages while keeping the same Markdown files usable by Jekyll for the website.

The goal is to have **one Markdown source** that:
- renders cleanly in the Jekyll site (for HTML),
- and can also be converted into a polished PDF using **Pandoc + LaTeX**.

---

## Directory Layout

```
_pandoc/
├── README.md ← this file
├── Makefile ← builds all PDFs
├── template_basic.latex ← custom LaTeX template used by pandoc
├── only.lua ← pandoc preprocessor for jekyll-only and pandoc-only
└── ../pdf/ ← generated PDFs appear here
```

At the root of the Jekyll site:

```
_data/footer.yml ← footer used by pandoc for PDF pages
assets/... ← added logos used by PDF footer/header
pdf/ ← PDFs are created here
```

---

## How It Works

### 1. Mark pages that should have PDFs

Any Markdown file (".md" extension) can be tagged with:

```yaml
---
title: Example Page
layout: page
pdf: true
---
```
The markdown dialect used must then be in the subset common to both jekyll and pandoc.

The Makefile will scan the markdown files and automatically detect files intended to become PDFs.

---

### 3. The Makefile

The `_pandoc/Makefile` automates the whole process.

It:

1. Scans the site for Markdown files with `pdf: true`.
2. Invokes Pandoc with the configured LaTeX template to produce a PDF.

The resulting PDFs go into:

```
site/pdf
```

---

## Example Commands

From inside the `_pandoc/` directory:

### Build all PDFs
```bash
make
```

### Clean all generated PDFs
```bash
make clean
```

### List all Markdown files with `pdf: true`
```bash
make list
```

### Force rebuild of one PDF
```bash
make ../pdf/about.pdf
```

---

## Recommended Workflow

1. Write Markdown pages normally for your Jekyll site.
2. When you also want a PDF version, add `pdf: true` to front matter.
3. From `_pandoc/`, run:
```bash
make
```
4. Find the generated PDFs in `pdf/`.

---

**Maintainer Notes**

- `_pandoc/Makefile` assumes it’s run from `_pandoc/`, with site root as `..`
- Pandoc and AWK must be available on your `PATH`
- Tested with pandoc v3.1.3

Note that this manual build process is temporary and at some point will be done automatically by
the site rebuild process, that is, at the same time the jekyll processing is done.

---

## Pandoc Tools Installation

These tools run on Linux.

On Ubuntu you have to install these things:

sudo apt install pandoc texlive-latex-base texlive-latex-recommended \
texlive-fonts-recommended texlive-xetex texlive-latex-extra

For other distros of Linux different commands will be needed, but the same list of packages must
be installed.
6 changes: 6 additions & 0 deletions site/_pandoc/basic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pdf-engine: xelatex
template: template_basic.tex
toc: true
number-sections: true
from: markdown+pipe_tables+autolink_bare_uris+bracketed_spans+markdown_attribute+raw_tex
listings: true
60 changes: 60 additions & 0 deletions site/_pandoc/only.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
-- only.lua: drop .only-jekyll, keep contents of .only-pandoc
-- Handles both native Div/Span nodes and raw HTML <div> wrappers.

local List = require 'pandoc.List'

local function has_class(classes, cls)
return classes and List.includes(classes, cls)
end

-- Native block divs (Pandoc recognized <div class="..."> as Div)
function Div(el)
if has_class(el.classes, 'only-jekyll') then
return {} -- drop entirely
elseif has_class(el.classes, 'only-pandoc') then
return el.content -- unwrap: keep inner blocks
end
end

-- Native inline spans
function Span(el)
if has_class(el.classes, 'only-jekyll') then
return {}
elseif has_class(el.classes, 'only-pandoc') then
return el.content
end
end

-- Fallback for raw HTML wrappers when Pandoc didn’t turn them into Divs.
function Pandoc(doc)
local out = List()
local mode = nil -- nil | 'drop' | 'keep'

local function is_open_of(txt, klass)
-- match <div ... class="... klass ...">
return txt:match('<div[^>]-class=[\'"][^\'"]-' .. klass .. '[^\'"]-[\'"]')
end

for _, blk in ipairs(doc.blocks) do
if blk.t == 'RawBlock' and blk.format:match('html') then
local t = blk.text
if is_open_of(t, 'only%-jekyll') then
mode = 'drop' -- drop wrapper and its inner content
elseif is_open_of(t, 'only%-pandoc') then
mode = 'keep' -- drop wrapper, keep inner content
elseif t:match('</div>') and mode ~= nil then
mode = nil
else
if not mode or mode == 'keep' then out:insert(blk) end
end
else
if not mode then
out:insert(blk)
elseif mode == 'keep' then
out:insert(blk)
end
end
end

return pandoc.Pandoc(out, doc.meta)
end
Loading