-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I am running the following test program with Text-BibTeX at git head (commit 9ed7d26):
use strict;
use utf8;
use warnings;
use feature 'say';
use Data::Dumper;
use Text::BibTeX::Name;
my $name = Text::BibTeX::Name->new({binmode => 'utf-8', normalization => 'NFD'}, 'al-Ṣāliḥ, ʿAbdallāh');
binmode STDOUT, ':encoding(UTF-8)';
say $name->part('last');
Sometimes it turns some of the characters into unicode replacement characters (FFFD), and sometimes it correctly prints the original name. I am just repeating the same command over and over and not changing anything between runs.
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-Ṣāliḥ
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-Ṣāliḥ
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-�āli�
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-Ṣāliḥ
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-�āliḥ
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-�āliḥ
ucl tmp $ LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-�āliḥ
If I add LANG=C it fails every time (at least for a few dozen tries):
ucl tmp $ LANG=C LD_LIBRARY_PATH=$HOME/perl5/lib perl -I /home/falsifian/co/Text-BibTeX/lib ./f
al-�āli�
(full environment at bottom).
(ETA: The reason I'm specifically blaming this on the constructor in the issue title is that I did a bit of debugging with perl -d and it looks like the problem has already occurred by the time the constructor returns.)
ucl tmp $ uname -a
NetBSD ucl.h.falsifian.org 10.1 NetBSD 10.1 (GENERIC) #0: Mon Dec 16 13:08:11 UTC 2024 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
A couple of downstream failures that may be related:
- biber isn't working for me unless I pass
--decodecharsset=null. Specifically, it puts a FFFD instead of ö into the nameBr{\"o}nnimannin the.bblfile (and a couple of similar substitutions), which causes LaTeX to fail. - The biber test suite has many failures. Many tests fail consistently if
LANG=Cis in the environment (biber's./Build testdoes this), and I've seen also failure(s) withoutLANG=C.
I tried digging a little deeper and suspect there's something going on with bt_split_name in btparse. At least, when I call bt_split_name("al-Ṣāliḥ, ʿAbdallāh", "filename", -1, -1 ) I similarly see unicode replacement characters in the returned structure. I stopped digging at that point. Hope this report is useful.
Env vars:
Details
ucl tmp $ env
_=/usr/bin/env
XTERM_VERSION=XTerm(372)
NO_COLOR=yes
LESSCLOSE=
PATH=/home/falsifian/perl5/bin:/home/falsifian/bin:/home/falsifian/w/unix_util/bin:/home/falsifian/sw/got/bin:/home/falsifian/sw/mu/usr/local/bin:/home/falsifian/.cabal/bin:/home/falsifian/.local/bin:/home/falsifian/var/go/bin:/home/falsifian/perl5/bin:/home/falsifian/bin:/home/falsifian/w/unix_util/bin:/home/falsifian/sw/got/bin:/home/falsifian/sw/mu/usr/local/bin:/home/falsifian/.cabal/bin:/home/falsifian/.local/bin:/home/falsifian/var/go/bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/games:/usr/pkg/sbin:/usr/pkg/bin:/usr/local/sbin:/usr/local/bin:/usr/X11R7/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
HISTFILE=/home/falsifian/.sh_history
PERL_LOCAL_LIB_ROOT=/home/falsifian/perl5:/home/falsifian/perl5
EDITOR=/home/falsifian/bin/my_editor
SHELL=/bin/ksh
BLOCKSIZE=K
USER=falsifian
LESS=-R-i
GOPATH=/home/falsifian/var/go
WINDOWPATH=5
LC_COLLATE=C
XTERM_SHELL=/usr/bin/tmux
TERM=screen
XMODIFIERS=@im=fcitx
LANG=en_CA.UTF-8
PERL_MM_OPT=INSTALL_BASE=/home/falsifian/perl5
LOGNAME=falsifian
ENV=/home/falsifian/.shrc
FZF_DEFAULT_OPTS=--highlight-line --no-color
BEANCOUNT_USER=_falsifian_bc
PERL_MB_OPT=--install_base "/home/falsifian/perl5"
TERM_PROGRAM=tmux
TMUX=/tmp/tmux-1000/default,10246,370
LC_TIME=C
GTK_IM_MODULE=fcitx
QT_IM_MODULE=fcitx
HOME=/home/falsifian
DISPLAY=:0
HISTSIZE=10000
TMUX_PANE=%375
PWD=/home/falsifian/tmp
PERL5LIB=/home/falsifian/perl5/lib/perl5:/home/falsifian/perl5/lib/perl5
LESSOPEN=
EMAIL=falsifian@falsifian.org
WINDOWID=27262988
XTERM_LOCALE=en_CA.UTF-8
PAGER=less
TERM_PROGRAM_VERSION=3.2a