I am incorporating some custom functions into nanoGPT. Performing a linear combination just works fine:
def static_combination(batch):
sequence_length, dimension = batch.size()
for sequence in range(1, sequence_length):
scalar = 1.0/(sequence+1)
batch[sequence] = torch.add(batch[sequence]*scalar, batch[sequence-1], alpha=(1-scalar))
return batch
Changing the operation to a multiplication yields a RunTimeError:
# multiply the token with the previous
def token_multiplication(batch):
sequence_length, token_dimension = batch.size()
for token_number in range(1, sequence_length):
batch[token_number] = torch.mul(batch[token_number], batch[token_number-1])
return batch
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 128]], which is output 0 of AsStridedBackward0, is at version 63; expected version 62 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
With torch.autograd.set_detect_anomaly(True)
:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 128]], which is output 0 of AsStridedBackward0, is at version 63; expected version 62 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later.
As far as I know retain_graph=True
isn’t used.
The environment setting is:
transformers
version: 4.40.2- Platform: Linux-6.1.85±x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.23.0
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.1+cu121 (False)
- Tensorflow version (GPU?): 2.15.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.8.3 (cpu)
- Jax version: 0.4.26
- JaxLib version: 0.4.26
- Using GPU in script?: No
- Using distributed or parallel set-up in script?:No