Skip to content

Commit f57cd63

Browse files
committed
imp:print:beancount output: more robust account/commodity encoding
Unsupported chars are now hex-encoded, not just converted to dashes. This helps keep account and commodity names unique, especially with the equity conversion account names generated by --infer-equity when using currency symbols. (Those could also be converted to ISO 4217 codes, in theory, but for now we just hex encode them, which is easier to make robust.) Also, Beancount commodity symbols are no longer enclosed in hledger-style double quotes.
1 parent cbdbe0a commit f57cd63

File tree

3 files changed

+81
-41
lines changed

3 files changed

+81
-41
lines changed

hledger-lib/Hledger/Write/Beancount.hs

Lines changed: 23 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import qualified Data.Text.Lazy as TL
2626
import qualified Data.Text.Lazy.Builder as TB
2727
import Safe (maximumBound)
2828
import Text.DocLayout (realLength)
29+
import Text.Printf
2930
import Text.Tabular.AsciiWide hiding (render)
3031

3132
import Hledger.Utils
@@ -109,7 +110,7 @@ postingAsLinesBeancount elideamount acctwidth amtwidth p =
109110
| elideamount = [mempty]
110111
| otherwise = showMixedAmountLinesB displayopts a'
111112
where
112-
displayopts = defaultFmt{ displayZeroCommodity=True, displayForceDecimalMark=True }
113+
displayopts = defaultFmt{ displayZeroCommodity=True, displayForceDecimalMark=True, displayQuotes=False }
113114
a' = mapMixedAmount amountToBeancount $ pamount p
114115
thisamtwidth = maximumBound 0 $ map wbWidth shownAmounts
115116

@@ -137,12 +138,12 @@ type BeancountAccountName = AccountName
137138
type BeancountAccountNameComponent = AccountName
138139

139140
-- | Convert a hledger account name to a valid Beancount account name.
140-
-- It replaces non-supported characters with a dash, it prepends the letter B
141-
-- to any part which doesn't begin with a letter or number, and it capitalises each part.
142-
-- It's possible this could generate the same beancount name for distinct hledger account names.
141+
-- It replaces spaces with dashes and other non-supported characters with C<HEXBYTES>;
142+
-- prepends the letter A- to any part which doesn't begin with a letter or number;
143+
-- and capitalises each part.
143144
-- It also checks that the first part is one of the required english
144145
-- account names Assets, Liabilities, Equity, Income, or Expenses, and if not
145-
-- it raises an informative error suggesting --alias.
146+
-- raises an informative error.
146147
-- Ref: https://beancount.github.io/docs/beancount_language_syntax.html#accounts
147148
accountNameToBeancount :: AccountName -> BeancountAccountName
148149
accountNameToBeancount a =
@@ -174,16 +175,19 @@ accountNameComponentToBeancount acctpart =
174175
Nothing -> ""
175176
Just (c,cs) ->
176177
textCapitalise $
177-
T.map (\d -> if isBeancountAccountChar d then d else '-') $ T.cons c cs
178+
T.concatMap (\d -> if isBeancountAccountChar d then (T.singleton d) else T.pack $ charToBeancount d) $ T.cons c cs
178179
where
179180
prependStartCharIfNeeded t =
180181
case T.uncons t of
181182
Just (c,_) | not $ isBeancountAccountStartChar c -> T.cons beancountAccountDummyStartChar t
182183
_ -> t
183184

184-
-- | Dummy valid starting character to prepend to Beancount account name parts if needed (B).
185+
-- | Dummy valid starting character to prepend to Beancount account name parts if needed (A).
185186
beancountAccountDummyStartChar :: Char
186-
beancountAccountDummyStartChar = 'B'
187+
beancountAccountDummyStartChar = 'A'
188+
189+
charToBeancount :: Char -> String
190+
charToBeancount c = if isSpace c then "-" else printf "C%x" c
187191

188192
-- XXX these probably allow too much unicode:
189193

@@ -222,25 +226,24 @@ type BeancountCommoditySymbol = CommoditySymbol
222226
-- That is: 2-24 uppercase letters / digits / apostrophe / period / underscore / dash,
223227
-- starting with a letter, and ending with a letter or digit.
224228
-- Ref: https://beancount.github.io/docs/beancount_language_syntax.html#commodities-currencies
225-
-- So this: removes any enclosing double quotes,
226-
-- replaces some common currency symbols with currency codes,
229+
-- So this:
230+
-- replaces common currency symbols with their ISO 4217 currency codes,
227231
-- capitalises all letters,
228-
-- replaces any invalid characters with a dash (-),
229-
-- prepends a B if the first character is not a letter,
230-
-- and appends a B if the last character is not a letter or digit.
231-
-- It's possible this could generate unreadable commodity names,
232-
-- or the same beancount name for distinct hledger commodity names.
232+
-- replaces spaces with dashes and other invalid characters with C<HEXBYTES>,
233+
-- prepends a C if the first character is not a letter,
234+
-- appends a C if the last character is not a letter or digit,
235+
-- and disables hledger's enclosing double quotes.
233236
--
234237
-- >>> commodityToBeancount ""
235-
-- "B"
238+
-- "C"
236239
-- >>> commodityToBeancount "$"
237240
-- "USD"
238241
-- >>> commodityToBeancount "Usd"
239242
-- "USD"
240243
-- >>> commodityToBeancount "\"a1\""
241244
-- "A1"
242245
-- >>> commodityToBeancount "\"A 1!\""
243-
-- "A-1-B"
246+
-- "A-1C21"
244247
--
245248
commodityToBeancount :: CommoditySymbol -> BeancountCommoditySymbol
246249
commodityToBeancount com =
@@ -251,16 +254,16 @@ commodityToBeancount com =
251254
Nothing ->
252255
com'
253256
& T.toUpper
254-
& T.map (\d -> if isBeancountCommodityChar d then d else '-')
257+
& T.concatMap (\d -> if isBeancountCommodityChar d then T.singleton d else T.pack $ charToBeancount d)
255258
& fixstart
256259
& fixend
257260
where
258261
fixstart bcom = case T.uncons bcom of
259262
Just (c,_) | isBeancountCommodityStartChar c -> bcom
260-
_ -> "B" <> bcom
263+
_ -> "C" <> bcom
261264
fixend bcom = case T.unsnoc bcom of
262265
Just (_,c) | isBeancountCommodityEndChar c -> bcom
263-
_ -> bcom <> "B"
266+
_ -> bcom <> "C"
264267

265268
-- | Is this a valid character in the middle of a Beancount commodity name (a capital letter, digit, or '._-) ?
266269
isBeancountCommodityChar :: Char -> Bool

hledger/hledger.m4.md

Lines changed: 12 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -834,38 +834,29 @@ hledger will try to adjust your data to suit Beancount.
834834
If you plan to export often, you may want to follow Beancount's conventions in your hledger data,
835835
to ease conversion. Eg use Beancount-friendly account names, currency codes instead of currency symbols,
836836
and avoid virtual postings, redundant cost notation, etc.
837-
838-
Here are more details, included here for now
837+
Here are more details
839838
(see also "hledger and Beancount" <https://hledger.org/beancount.html>).
840839

841840
#### Beancount account names
842841

843-
hledger will try adjust your account names, if needed, to
844-
[Beancount account names](https://beancount.github.io/docs/beancount_language_syntax.html#accounts),
845-
by capitalising, replacing unsupported characters with `-`, and
846-
prepending `B` to parts which don't begin with a letter or digit.
847-
(It's possible for this to convert distinct hledger account names to the same beancount name.
848-
Eg, hledger's automatic equity conversion accounts can have currency symbols in their name,
849-
so `equity:conversion:$-€` becomes `equity:conversion:B---`.)
850-
851-
In addition, you must ensure that the top level account names are `Assets`, `Liabilities`, `Equity`, `Income`, and `Expenses`,
852-
which Beancount requires.
842+
hledger will adjust your account names when needed, to make valid
843+
[Beancount account names](https://beancount.github.io/docs/beancount_language_syntax.html#accounts)
844+
(capitalising, replacing spaces with `-`, replacing other unsupported characters with `C<HEXBYTES>`,
845+
and prepending `A` to account name parts which don't begin with a letter or digit).
846+
However, you must ensure that all top level account names are one of the five required by Beancount:
847+
`Assets`, `Liabilities`, `Equity`, `Income`, or `Expenses`.
853848
If yours are named differently, you can use [account aliases](#alias-directive),
854849
usually in the form of `--alias` options, possibly stored in a [config file](#config-file).
855850
(An example: [hledger2beancount.conf](https://github.com/simonmichael/hledger/blob/master/examples/hledger2beancount.conf))
856851

857852
#### Beancount commodity names
858853

859-
hledger will adjust your commodity names, if needed, to
854+
hledger will adjust commodity names when needed, to make valid
860855
[Beancount commodity/currency names](https://beancount.github.io/docs/beancount_language_syntax.html#commodities-currencies),
861-
which must be 2-24 uppercase letters, digits, or `'`, `.`, `_`, `-`,
862-
beginning with a letter and ending with a letter or digit.
863-
hledger will convert known currency symbols to [ISO 4217 currency codes](https://en.wikipedia.org/wiki/ISO_4217#Active_codes).
864-
Otherwise, it will capitalise letters,
865-
replace unsupported characters with a dash (-),
866-
and prepend/append a "B" when needed.
867-
(It's possible for this to generate unreadable commodity names,
868-
or to convert distinct hledger commodity names to the same beancount name.)
856+
(which must be 2-24 uppercase letters, digits, or `'`, `.`, `_`, `-`, beginning with a letter and ending with a letter or digit).
857+
hledger will convert known currency symbols to [ISO 4217 currency codes](https://en.wikipedia.org/wiki/ISO_4217#Active_codes),
858+
capitalise letters, replace spaces with `-`, replace other unsupported characters with `C<HEXBYTES>`,
859+
and prepend/append a "C" when needed.
869860

870861
#### Beancount virtual postings
871862

hledger/test/print/beancount.test

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# * print command's beancount output format
2+
3+
# ** 1. Unrecognised top level account names are rejected.
4+
<
5+
2000-01-01
6+
other 0 ABC
7+
8+
$ hledger -f- print -O beancount
9+
>2 /bad top-level account/
10+
>=1
11+
12+
# ** 2. Otherwise, accounts are encoded to suit beancount, and open directives are added.
13+
<
14+
2000-01-01
15+
assets:a 0 ABC
16+
equity:$-€:$ 0 USD
17+
18+
$ hledger -f- print -O beancount
19+
2000-01-01 open Assets:A
20+
2000-01-01 open Equity:C24-C20ac:C24
21+
22+
2000-01-01 *
23+
Assets:A 0 ABC
24+
Equity:C24-C20ac:C24 0 USD
25+
26+
>=
27+
28+
# ** 3. Commodity symbols are converted to ISO 4217 codes, or encoded, to suit beancount.
29+
<
30+
2000-01-01
31+
assets $0
32+
assets 0
33+
assets 0!
34+
assets 0 "size 2 pencils"
35+
36+
$ hledger -f- print -O beancount
37+
2000-01-01 open Assets
38+
39+
2000-01-01 *
40+
Assets 0 USD
41+
Assets 0 C
42+
Assets 0 C21
43+
Assets 0 SIZE-2-PENCILS
44+
45+
>=
46+

0 commit comments

Comments
 (0)