Hi everyone,
I’m encountering a segmentation fault when using nnsight in a multiprocessing setup. The crash happens deep inside the C extension _c.py_mount and terminates with:
_c.py_mount: Fatal Python error: Segmentation fault
The error occurs when evaluating a model using joblib.Parallel (loky backend) and multiple GPUs. Each worker calls an evaluate_checkpoint function that internally uses nnsight for tracing/interventions.
Environment:
- Python 3.12.11
- PyTorch (GPU build)
- nnsight (latest release from PyPI)
- CUDA 12.x
What I’ve tried:
- Reinstalling
nnsightand PyTorch - Setting multiprocessing start method to
spawn - Ensuring CUDA drivers and PyTorch versions match
The segmentation fault still occurs randomly during execution.
Has anyone seen similar behavior with _c.py_mount or nnsight under Python 3.12 or in multiprocessing contexts?
Any guidance or workarounds would be appreciated!