Distributed Alignment Search Tutorial Question

rahulmi · May 15, 2025, 6:15pm

Hi everyone,
I am Rahul Chowdhury. I am trying to train a rotation matrix for DAS using the tutorial on Llama 8B. I am running out of memory. So, I am trying to use accelerate for multi-gpu training. Still, its running out of memory with just a batch size of 1. I have total capacity of around 90 GB with two gpus. I was wondering if the tutorial was optimized and if the gradient flows back through the layers too.

michael · May 22, 2025, 3:36pm

Hi Rahul,

Welcome and apologies for the late reply! This tutorial is indeed memory intensive, and is not optimized (it hasn’t been worked on in quite some time afaik). I faced the same issues with a 48GB gpu recently when trying it out.

We might be able to clean it up as we currently are doing some tutorial related work, but if you end up finding anything in the short term which helps, please let us know