Difficult HDDM 0.9.8 installation on Windows (conda, PyMC2, encoding, joblib)

## Summary

I'm using HDDM on Windows via Anaconda/conda, and I ran into several installation and runtime issues before I could get `hddm==0.9.8` working:

1. A small typo in `setup.py` (`pandas` and `patsy` concatenated in `setup_requires`) causes an `InvalidRequirement` error.
2. PyMC2 / Fortran toolchain requirements are quite fragile on Windows (Fortran compiler + BLAS/LAPACK).
3. There are encoding-related problems when building PyMC on a Traditional Chinese Windows locale (`cp950`).
4. At runtime, parallel sampling using `joblib`'s default `loky` backend crashes on my setup with an internal `_winapi` error; switching to the `threading` backend fixes it.

I'm sharing this as a “field report” and to suggest that documenting a known-good Windows/conda setup and these gotchas might help other users.

---

## Environment

- **OS**: Windows 11 Home 64-bit (Traditional Chinese locale, code page `cp950`)
- **Conda / Anaconda**: `conda 25.7.0` (Anaconda3)
- **Python (HDDM env)**: `3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 05:35:01) [MSC v.1916 64 bit (AMD64)]`  
  (conda env name: `hddm098`)
- **pip (in hddm098)**: `pip 24.0` (`python -m pip --version`)
- **HDDM**: 0.9.8 (`pip install hddm==0.9.8`)
- **Not using LAN models**: no PyTorch; only classical HDDM / HDDMStimCoding

---

## 1. `setup.py` typo in `setup_requires`

In `setup.py`, the `setup_requires` list currently contained:

```{python}
setup_requires=['numpy >=1.20.0, < 1.23.0', 'scipy >= 1.6.3, < 1.7.0', 'cython >= 0.29.0, < 1.0.0', 'pandas >= 1.0.0, < 1.5.0' 'patsy', 'seaborn == 0.11.0', 'statsmodels >= 0.12.0, < 0.13.0', 'tqdm >= 4.1.0', 'scikit-learn == 0.24', 'cloudpickle >= 2.0.0', 'kabuki >= 0.6.0', 'PyMC >= 2.3.3, < 3.0.0', 'arviz == 0.12', 'ssm-simulators == 0.3.2'],
```

Because of the missing comma after the pandas requirement, `pip` sees an invalid requirement:

```
pandas >= 1.0.0, < 1.5.0patsy
```

and raises:

> InvalidRequirement: Expected end or semicolon (after version specifier)
>     pandas >= 1.0.0, < 1.5.0patsy
>                ~~~~~~~~~~~~~~~~~^

Fix: adding the comma between the two strings. After this change, the `InvalidRequirement` error for this part disappeared. 

(I have opened a small PR that just adds this comma.)

---

## 2. PyMC2 / Fortran toolchain and encoding issues on Windows 11 (cp950)

When installing HDDM 0.9.8 in a fresh conda env (`python=3.7.12`), `pip` pulls in PyMC 2.x, which is built from source. On my Windows 11 machine this led to:

- No Fortran compiler found (`g77`, `gfortran`, `ifort`, etc.).
- Messages from `numpy.distutils` about missing BLAS/LAPACK.
- A UnicodeDecodeError when `f2py` parses `pymc\\flib.f` under the `cp950` locale:

> UnicodeDecodeError: 'cp950' codec can't decode byte 0xce in position 5497: illegal multibyte sequence
> Reading fortran codes...
>         Reading file 'pymc\\flib.f' (format:fix,strict)

This seems to be a combination of:

- Old Fortran sources in PyMC2,
- `f2py` reading the fil with a default encoding that doesn't match the Windows `cp950` code page.

I eventually managed to create a working environment by carefully pinning versions and using a conda-based stack, but from a user perspective this is pretty opaque if you just want to run HDDM models and not debug Fortran builds.

---

## 3. Runtime issue: joblib + Windows 11 + Python 3.7 (`_winapi.SYNCHRONIZE`)

After installation, I ran hierarchical HDDM/HDDMStimCoding models with multiple chains using joblib.Parallel. With the default backend (loky) on this Windows 11 / Python 3.7 setup, I saw errors like:

> AttributeError: module '_winapi' has no attribute 'SYNCHRONIZE'
> ...
> joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

This happened as soon as the workers were spawned, before the model did heavy computation. Switching to the threading backend fixed it:

```{python}
from joblib import Parallel, delayed

models = Parallel(
    n_jobs=n_jobs,
    backend="threading",
)(
    delayed(run_chain)(data, MODEL_DEPENDENCIES, i)
    for i in range(N_CHAINS)
)
```

With `backend="threading"` I was able to run 4 chains in parallel and then load the models back into a Jupyter notebook for diagnostics and plotting.

This isn’t necessarily an HDDM bug, but it may be worth mentioning in the docs/examples for Windows users who parallelize chains.

---

## 4. Suggestions

Totally understand that HDDM 0.9.8 is built on an older stack (PyMC2 + Fortran, etc.), so some friction is expected. Based on this experience, a few things that might help other users:

1. Fix the `setup.py` comma between `pandas` and `patsy` in setup_requires (tiny but important; already addressed in a PR).
2. Provide a tested conda environment recipe for Windows 11, e.g.:
   - Recommended Python version (3.7 vs 3.8, etc.),
   - Pinned NumPy/SciPy/Pandas versions known to work,
   - Notes on Fortran and BLAS/LAPACK expectations.
3. Mention the encoding/locale caveat:
   - On non-English Windows locales (e.g., Traditional Chinese, `cp950`), the PyMC2 Fortran build can hit Unicode errors in `pymc\\flib.f`.
4. Suggest `backend="threading"` on Windows for examples that use `joblib.Parallel` for multiple chains, or at least mention that loky may fail on some Windows/Pythons 3.7 setups.

Thanks for maintaining HDDM. It is very useful for real experimental data, even with these setup hurdles.

*Text edited with the help of ChatGPT.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difficult HDDM 0.9.8 installation on Windows (conda, PyMC2, encoding, joblib) #122

Summary

Environment

1. `setup.py` typo in `setup_requires`

2. PyMC2 / Fortran toolchain and encoding issues on Windows 11 (cp950)

3. Runtime issue: joblib + Windows 11 + Python 3.7 (`_winapi.SYNCHRONIZE`)

4. Suggestions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Difficult HDDM 0.9.8 installation on Windows (conda, PyMC2, encoding, joblib) #122

Description

Summary

Environment

1. setup.py typo in setup_requires

2. PyMC2 / Fortran toolchain and encoding issues on Windows 11 (cp950)

3. Runtime issue: joblib + Windows 11 + Python 3.7 (_winapi.SYNCHRONIZE)

4. Suggestions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `setup.py` typo in `setup_requires`

3. Runtime issue: joblib + Windows 11 + Python 3.7 (`_winapi.SYNCHRONIZE`)