NNsight 0.5 Prerelease: Feedback Requested

ebortz · June 20, 2025, 5:24pm

NNsight 0.5 Prerelease Now Available

We’re excited to announce the prerelease of NNsight 0.5, packed with new features that you’ve been requesting.

What’s New

Enhanced intermediate value access during forward passes
Flexible code execution within NNsight’s tracing context
Improved debugging capabilities for smoother development

Get Started

Ready to explore NNsight 0.5? Our comprehensive Colab notebook includes installation instructions and feature overviews: Try NNsight 0.5 →

Share Your Feedback

Your input shapes NNsight’s development. We’d love to hear from you; comment on this thread with feedback, bugs, and any other issues that you’d like to report!

Live Feedback Session

Join us for a real-time discussion about NNsight 0.5:

Date: July 10th, 1:00 PM ET
RSVP on Discord →

Prerelease Remote Model Logistics

NDIF remote models will not support NNsight 0.5 until it is the official default release. Stay tuned for a July announcement of the official NNsight 0.5 release, complete with NDIF remote model integration!

Thank you for being part of the NDIF community. Your feedback has been incredibly helpful for building this new update!

clement_dumas · June 20, 2025, 10:37pm

Good that there is an error message for this. I think it’d be great if it could redirect to a tutorial explaining that execution order matter.

clement_dumas · June 20, 2025, 10:52pm

would be good to have an example with retain_graph=True for grad. e.g.:

# Now in NNsight 0.5
with model.trace("Hello World"):

  l1_mlp = model.transformer.h[0].mlp.output

  x = model.lm_head.output

  # new: .backward() syntax
  with x.sum().backward(retain_graph=True):
    # access .grad within backward context
    grad = l1_mlp.grad.save()
  with (x.sum() * 2).backward():
    grad2 = l1_mlp.grad.save()
print(grad)
assert th.allclose(grad*2, grad2)

clement_dumas · June 20, 2025, 11:01pm

I still think this is a terrible default:

The problem is that for skipping LLM layer it make sense, however for MLP it doesn’t make sense if they’re implemented like this:
layer_out = mlp_input + mlp(mlp_input)
Here I’d expect “skip” to skip with 0, not mlp_input.

I expect this default to cause some confusion with newcomers.

What’s the usecase of skip without argument actually? It doesn’t even work on transformer layers

editing this message instead of replying as I can’t reply 4 times in a row

cache error message when accessing a not cached module could use some love:

ebortz · June 20, 2025, 11:23pm

@clement_dumas just changed site permissions to allow 20 replies per topic instead of 3 (let me know if you hit the limit again!)

clement_dumas · June 20, 2025, 11:29pm

Unsure if having a default variant name is a good thing. This also seems like a little benefit - could cause weird bugs very hard to debug feature (e.g. running .export_edits in 2 project using the same env).

I’m also unsure if it’s good to store it in the global cache rather than in a local folder for the same reason. I’m probably fine with forcing to name the edit but keep a default save to hf cache. But a way it could go wrong with this setup is people working on the same project sharing an HF cache and overwriting their edits between each other (this is how I work with my collaborators so could be just us).

sfeucht · June 23, 2025, 5:08pm

Archiving some feedback from the Discord server, where I asked:

Am I being dumb or does 0.5.0.dev0 not have nnsight.list() anymore? I just upgraded from 0.4.5 and now when I do import nnsight; nnsight.list() I get AttributeError: module 'nnsight' has no attribute 'list'

According to Adam, nnsight 0.5 doesn’t have nnsight.list()anymore! He said,

You can just use regular python code inside of the tracing context, so simply call list()

Jaden also said that future versions will indicate nnsight.list() is deprecated.

eldoprano · June 24, 2025, 10:20am

Archiving my Discord feedback:

I’m trying the new 0.5.0.dev0 and when loading the model and trying to look at the number of layers:

self.model = LanguageModel(hf_model_id, device_map=device, dispatch=True, max_memory={0: model_gpu_memory_budget})
self.num_layers = len(self.model.model.layers)

I’m getting the error that TypeError: object of type ‘Envoy’ has no len()
In the previous version, calling len() was working

clement_dumas · June 24, 2025, 4:12pm

Now that nnsight enforce the execution order, is there any reason for not calling tracer.stop by default at the end of the trace? Is this already the default?

AdamBelfki · June 24, 2025, 6:19pm

Hmm, maybe have it as a flag on the .trace() call? I am thinking that having the model execution interrupt early by default could make the new caching functionality ambiguous, because later activations will not be cached.

Or, have it interrupt by default if no cache has been registered?

ncrispino · June 24, 2025, 6:52pm

I’m trying to run the new version with vLLM but it appears there’s an error using the previously suggested version of vLLM, 0.6.4.post1. In the provided notebook, I’m running

!pip install nnsight==0.5.0.dev
!pip install vllm==0.6.4.post1
!pip install triton==3.1.0

But using from nnsight.modeling.vllm import VLLM I receive the following error:

ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipython-input-4-2319256531.py in <cell line: 0>()
----> 1 from nnsight.modeling.vllm import VLLM

3 frames
/usr/local/lib/python3.11/dist-packages/nnsight/modeling/vllm/__init__.py in <module>
----> 1 from .vllm import VLLM

/usr/local/lib/python3.11/dist-packages/nnsight/modeling/vllm/vllm.py in <module>
      5 from ...util import WrapperModule
      6 from ..mixins import RemoteableMixin
----> 7 from .workers.GPUWorker import NNsightGPUWorker
      8 from .sampling import NNsightSamplingParams
      9 from dataclasses import fields

/usr/local/lib/python3.11/dist-packages/nnsight/modeling/vllm/workers/GPUWorker.py in <module>
      1 from vllm.worker.worker import Worker
      2 
----> 3 from ..model_runners.GPUModelRunner import NNsightGPUModelRunner
      4 
      5 

/usr/local/lib/python3.11/dist-packages/nnsight/modeling/vllm/model_runners/GPUModelRunner.py in <module>
      7 import torch
      8 import torch.distributed
----> 9 from vllm.distributed.kv_transfer import get_kv_transfer_group
     10 
     11 from vllm.distributed import (get_pp_group,

ModuleNotFoundError: No module named 'vllm.distributed.kv_transfer'

AdamBelfki · June 24, 2025, 7:04pm

vLLM is not supported yet with nnsight==0.5.x

clement_dumas · June 25, 2025, 12:40pm

If you don’t have a good way to handle automatically waiting for the cache to be filled before exiting then indeed it makes sense to not do it.

I think it’s less confusing to do tracer.stop than having a trace kwarg.

But also do we really want to deprecate module.output.stop()then ? It sounds like the clean way to stop properly when using a cache

Antonin_Poche · June 26, 2025, 9:21pm

Hi, the library is great, and the new release allows for a lot of things. Thank you for the great work!

I stumbled upon what seems to be a bug, when I execute the following lines in version 0.4.8, it goes smoothly, but with the 0.5.0.dev2, I obtain the following error:

Do you have any idea of the cause? It happens with other Bert IDs. Or when given a Bert instance directly.

from nnsight import LanguageModel

model = LanguageModel('bert-base-uncased')

AttributeError: BertAttention(
  (self): BertSdpaSelfAttention(
    (query): Linear(in_features=768, out_features=768, bias=True)
    (key): Linear(in_features=768, out_features=768, bias=True)
    (value): Linear(in_features=768, out_features=768, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (output): BertSelfOutput(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
) has no attribute _handle_overloaded_mount

Update

I did some more tests:

from nnsight import LanguageModel
from transformers import (
    AutoModelForCausalLM,
    AutoModelForSeq2SeqLM,
    AutoModelForSequenceClassification,
)

ALL_MODEL_LOADERS = {
    "hf-internal-testing/tiny-random-albert": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-bart": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-bert": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-DebertaV2Model": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-distilbert": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-ElectraModel": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-roberta": AutoModelForSequenceClassification,
    "hf-internal-testing/tiny-random-t5": AutoModelForSeq2SeqLM,
    "hf-internal-testing/tiny-random-gpt2": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-gpt_neo": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-gptj": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-CodeGenForCausalLM": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-FalconModel": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-Gemma3ForCausalLM": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-LlamaForCausalLM": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-MistralForCausalLM": AutoModelForCausalLM,
    "hf-internal-testing/tiny-random-Starcoder2ForCausalLM": AutoModelForCausalLM,
}

for id, autoclass in ALL_MODEL_LOADERS.items():
    try:
        LanguageModel(id, automodel=autoclass)
    except AttributeError:
        print(f"Failed {id.split('-')[-1]}")
        continue

Failed bert
Failed DebertaV2Model
Failed ElectraModel
Failed roberta

clement_dumas · June 27, 2025, 9:57am

What’s the state of nnsight auto device handling in 0.5? Was this something specific to proxy that we should now handle ourselves, or is it supposed to still automatically move tensors to the right device?

E.g. this worked with 0.4 but doesn’t with 0.5 (tbc I don’t think it’s a problem, but would be good to know that this doesn’t work anymore)

from nnsight import LanguageModel
import torch as th
model = LanguageModel("Maykeye/TinyLLama-v0", device_map="cuda")

v = th.randn([1, 5, 64], device="cpu")* 0.01

with model.trace("Hello, world!"):
    print(model.model.layers[0].output[0].shape)
    model.model.layers[0].output[0][:] = v + model.model.layers[0].output[0]
    out = model.lm_head.output.save()

Antonin_Poche · June 27, 2025, 1:56pm

It seems that when :

a variable is defined outside of the tracing context
and a variable is assigned to it inside the context
and the assigned value include computations based on the initial one

Then, an UnboundLocalError is raised.

from nnsight import LanguageModel

a = 1
with LanguageModel("hf-internal-testing/tiny-random-gpt2").trace("test"):
    a = a

---------------------------------------------------------------------------
NNsightException                          Traceback (most recent call last)
Cell In[12], line 4
      1 from nnsight import LanguageModel
      3 a = None
----> 4 with LanguageModel("hf-internal-testing/tiny-random-gpt2").trace("test"):
      5     a = a or None

File ~/interpreto/.venv/lib/python3.10/site-packages/nnsight/intervention/tracing/base.py:387, in Tracer.__exit__(self, exc_type, exc_val, exc_tb)
    383 # Suppress the ExitTracingException but let other exceptions propagate
    384 if exc_type is ExitTracingException:
    385 
    386     # Execute the traced code using the configured backend
--> 387     self.backend(self)
    389     return True

File ~/interpreto/.venv/lib/python3.10/site-packages/nnsight/intervention/backends/execution.py:24, in ExecutionBackend.__call__(self, tracer)
     21     tracer.execute(fn)
     22 except Exception as e:
---> 24     raise wrap_exception(e, tracer.info) from None
     25 finally:
     26     Globals.exit()

NNsightException: 

Traceback (most recent call last):
  File "/tmp/ipykernel_3155989/2757696273.py", line 5, in <module>
    a = a or None

UnboundLocalError: local variable 'a' referenced before assignment

JadenFK · June 27, 2025, 6:09pm

Oh good point… Yeah this won’t work anymore. This will work remotely though. No need to manage the device there.

clement_dumas · July 1, 2025, 2:06pm

Was looking into the limits of source. Seems like we can’t get into torch.functionnal.
I think it’s totally fair, accessing module is already super helpful, but if it’s easy to implement might be worth it?
For context I was trying to access attn_weight in scaled_dot_product_attention: pytorch/torch/onnx/symbolic_opset14.py at 1586521461c8dc642735466fc143b7d366a858d0 · pytorch/pytorch · GitHub

from nnsight import LanguageModel

model = LanguageModel("Maykeye/TinyLLama-v0")
with model.trace(["hello", "hello the fox is jumping"]):
    print(model.model.layers[0].self_attn.source.attention_interface_0.source)
    print(model.model.layers[0].self_attn.source.attention_interface_0.source.torch_nn_functional_scaled_dot_product_attention_0.source)

I can probably recompute it with a sdpa module and access it from here or just enforce users to use eager implementation

Error message:

---------------------------------------------------------------------------
NNsightException                          Traceback (most recent call last)
Cell In[21], line 4
      1 from nnsight import LanguageModel
      3 model = LanguageModel("Maykeye/TinyLLama-v0")
----> 4 with model.trace(["hello", "hello the fox is jumping"]):
      5     print(model.model.layers[0].self_attn.source.attention_interface_0.source)
      6     print(model.model.layers[0].self_attn.source.attention_interface_0.source.torch_nn_functional_scaled_dot_product_attention_0.source)

File ~/.venv/lib/python3.10/site-packages/nnsight/intervention/tracing/base.py:387, in Tracer.__exit__(self, exc_type, exc_val, exc_tb)
    383 # Suppress the ExitTracingException but let other exceptions propagate
    384 if exc_type is ExitTracingException:
    385 
    386     # Execute the traced code using the configured backend
--> 387     self.backend(self)
    389     return True

File ~/.venv/lib/python3.10/site-packages/nnsight/intervention/backends/execution.py:24, in ExecutionBackend.__call__(self, tracer)
     21     tracer.execute(fn)
     22 except Exception as e:
---> 24     raise wrap_exception(e, tracer.info) from None
     25 finally:
     26     Globals.exit()

NNsightException: 

Traceback (most recent call last):
  File "/tmp/ipykernel_30098/2327001449.py", line 6, in <module>
    print(model.model.layers[0].self_attn.source.attention_interface_0.source.torch_nn_functional_scaled_dot_product_attention_0.source)
  File "/root/.venv/lib/python3.10/site-packages/nnsight/intervention/envoy.py", line 1183, in source
    source, line_numbers, fn = inject(fn, wrap, self.name)
  File "/root/.venv/lib/python3.10/site-packages/nnsight/intervention/inject.py", line 60, in convert
    source = textwrap.dedent(inspect.getsource(fn))
  File "/usr/lib/python3.10/inspect.py", line 1139, in getsource
    lines, lnum = getsourcelines(object)
  File "/usr/lib/python3.10/inspect.py", line 1121, in getsourcelines
    lines, lnum = findsource(object)
  File "/usr/lib/python3.10/inspect.py", line 940, in findsource
    file = getsourcefile(object)
  File "/usr/lib/python3.10/inspect.py", line 817, in getsourcefile
    filename = getfile(object)
  File "/root/.venv/lib/python3.10/site-packages/torch/package/package_importer.py", line 725, in _patched_getfile
    return _orig_getfile(object)
  File "/usr/lib/python3.10/inspect.py", line 797, in getfile
    raise TypeError('module, class, method, function, traceback, frame, or '

TypeError: module, class, method, function, traceback, frame, or code object was expected, got builtin_function_or_method

clement_dumas · July 1, 2025, 5:23pm

I’m trying to access the source of the forward of a model but can’t figure out how to do it. Depending on which one I look at I get different errors

from nnsight import LanguageModel
model = LanguageModel("gpt2")
with model.trace("hello"):
    print(model.source)  # has no attribute source
with model.trace("hello"):
    print(model.transformer.source)  # has no attribute source

with model.trace("hello"):
    print(model.transformer.h.source)

Traceback (most recent call last):

  File ~/.venv/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3579 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[20], line 1
    with model.trace("hello"):

  File ~/.venv/lib/python3.10/site-packages/nnsight/intervention/tracing/base.py:387 in __exit__
    self.backend(self)

  File ~/.venv/lib/python3.10/site-packages/nnsight/intervention/backends/execution.py:24 in __call__
    raise wrap_exception(e, tracer.info) from None

  File <nnsight>:15
    )
    ^
NNsightException: f-string expression part cannot include a backslash

The first 2 don’t have source and the last one fails weirdly

AdamBelfki · July 1, 2025, 6:02pm

@clement_dumas good catch with the source failing for those modules.

For the first two instances, it’s because the forward method is decorated by this @auto_docstring call (transformers/src/transformers/models/gpt2/modeling_gpt2.py at 260846efadb9b03472427a46c30ba8f717d182c4 · huggingface/transformers · GitHub). I’ll look into it and push a fix for it.

For h - nn.ModuleList - I’m not sure yet what is happening there.

EDIT:

I pushed a fix for this Issue.

In regard to model.transformer.h.source, it turns out that nn.ModuleList does not implement a forward method.