I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
|
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device) |
Is it OK to just make position ids in [0, seq_len) ? Since I have not found the use of position embeddings for prefix matrix.