From my investigation, the clean up is successful from the nnsight side. However, if gradients are required, the memory usage will be high so you need to empty the cache after tracing
@AdamBelfki
Chat
Do you mean we can do this?
with model.session(remote=True):
model.trace(...):
# do something
create variable
del variable
torch.cuda.empty_cache()
gc.collect()
Not quite. My point about emptying the cache only applies to local execution, where you would want to put it in place between iterations if you are running out of memory from defining tracers or sessions in a loop.
torch.cuda.empty_cache()
is not a traceable operation at the moment and so calling it inside a Tracer context won’t work.
NDIF does already empty its cache and clean up its memory, but that will be between user requests and not between model executions involved in the same request.