-
Notifications
You must be signed in to change notification settings - Fork 120
Description
Summary
I'm using HDDM on Windows via Anaconda/conda, and I ran into several installation and runtime issues before I could get hddm==0.9.8 working:
- A small typo in
setup.py(pandasandpatsyconcatenated insetup_requires) causes anInvalidRequirementerror. - PyMC2 / Fortran toolchain requirements are quite fragile on Windows (Fortran compiler + BLAS/LAPACK).
- There are encoding-related problems when building PyMC on a Traditional Chinese Windows locale (
cp950). - At runtime, parallel sampling using
joblib's defaultlokybackend crashes on my setup with an internal_winapierror; switching to thethreadingbackend fixes it.
I'm sharing this as a “field report” and to suggest that documenting a known-good Windows/conda setup and these gotchas might help other users.
Environment
- OS: Windows 11 Home 64-bit (Traditional Chinese locale, code page
cp950) - Conda / Anaconda:
conda 25.7.0(Anaconda3) - Python (HDDM env):
3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 05:35:01) [MSC v.1916 64 bit (AMD64)]
(conda env name:hddm098) - pip (in hddm098):
pip 24.0(python -m pip --version) - HDDM: 0.9.8 (
pip install hddm==0.9.8) - Not using LAN models: no PyTorch; only classical HDDM / HDDMStimCoding
1. setup.py typo in setup_requires
In setup.py, the setup_requires list currently contained:
setup_requires=['numpy >=1.20.0, < 1.23.0', 'scipy >= 1.6.3, < 1.7.0', 'cython >= 0.29.0, < 1.0.0', 'pandas >= 1.0.0, < 1.5.0' 'patsy', 'seaborn == 0.11.0', 'statsmodels >= 0.12.0, < 0.13.0', 'tqdm >= 4.1.0', 'scikit-learn == 0.24', 'cloudpickle >= 2.0.0', 'kabuki >= 0.6.0', 'PyMC >= 2.3.3, < 3.0.0', 'arviz == 0.12', 'ssm-simulators == 0.3.2'],
Because of the missing comma after the pandas requirement, pip sees an invalid requirement:
pandas >= 1.0.0, < 1.5.0patsy
and raises:
InvalidRequirement: Expected end or semicolon (after version specifier)
pandas >= 1.0.0, < 1.5.0patsy
~~~~~~~~~~~~~~~~~^
Fix: adding the comma between the two strings. After this change, the InvalidRequirement error for this part disappeared.
(I have opened a small PR that just adds this comma.)
2. PyMC2 / Fortran toolchain and encoding issues on Windows 11 (cp950)
When installing HDDM 0.9.8 in a fresh conda env (python=3.7.12), pip pulls in PyMC 2.x, which is built from source. On my Windows 11 machine this led to:
- No Fortran compiler found (
g77,gfortran,ifort, etc.). - Messages from
numpy.distutilsabout missing BLAS/LAPACK. - A UnicodeDecodeError when
f2pyparsespymc\\flib.funder thecp950locale:
UnicodeDecodeError: 'cp950' codec can't decode byte 0xce in position 5497: illegal multibyte sequence
Reading fortran codes...
Reading file 'pymc\flib.f' (format:fix,strict)
This seems to be a combination of:
- Old Fortran sources in PyMC2,
f2pyreading the fil with a default encoding that doesn't match the Windowscp950code page.
I eventually managed to create a working environment by carefully pinning versions and using a conda-based stack, but from a user perspective this is pretty opaque if you just want to run HDDM models and not debug Fortran builds.
3. Runtime issue: joblib + Windows 11 + Python 3.7 (_winapi.SYNCHRONIZE)
After installation, I ran hierarchical HDDM/HDDMStimCoding models with multiple chains using joblib.Parallel. With the default backend (loky) on this Windows 11 / Python 3.7 setup, I saw errors like:
AttributeError: module '_winapi' has no attribute 'SYNCHRONIZE'
...
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
This happened as soon as the workers were spawned, before the model did heavy computation. Switching to the threading backend fixed it:
from joblib import Parallel, delayed
models = Parallel(
n_jobs=n_jobs,
backend="threading",
)(
delayed(run_chain)(data, MODEL_DEPENDENCIES, i)
for i in range(N_CHAINS)
)
With backend="threading" I was able to run 4 chains in parallel and then load the models back into a Jupyter notebook for diagnostics and plotting.
This isn’t necessarily an HDDM bug, but it may be worth mentioning in the docs/examples for Windows users who parallelize chains.
4. Suggestions
Totally understand that HDDM 0.9.8 is built on an older stack (PyMC2 + Fortran, etc.), so some friction is expected. Based on this experience, a few things that might help other users:
- Fix the
setup.pycomma betweenpandasandpatsyin setup_requires (tiny but important; already addressed in a PR). - Provide a tested conda environment recipe for Windows 11, e.g.:
- Recommended Python version (3.7 vs 3.8, etc.),
- Pinned NumPy/SciPy/Pandas versions known to work,
- Notes on Fortran and BLAS/LAPACK expectations.
- Mention the encoding/locale caveat:
- On non-English Windows locales (e.g., Traditional Chinese,
cp950), the PyMC2 Fortran build can hit Unicode errors inpymc\\flib.f.
- On non-English Windows locales (e.g., Traditional Chinese,
- Suggest
backend="threading"on Windows for examples that usejoblib.Parallelfor multiple chains, or at least mention that loky may fail on some Windows/Pythons 3.7 setups.
Thanks for maintaining HDDM. It is very useful for real experimental data, even with these setup hurdles.
Text edited with the help of ChatGPT.