Protehnica: "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention

Sunday, October 15, 2023

"The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results."

I was playing with Huggingface transformers and kept getting the warning "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.". I finally found a solution in a StackOverflow reply that will be credited at the end:

To fix this, first add this code after loading pre-trained tokenizer:
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Then pass this in generate method like this:
gen_ids = model.generate(**encodings, pad_token_id=tokenizer.pad_token_id, max_new_tokens=200)

In short, there are two additions/changes you need to make:

When initializing your tokenizer, set:
tokenizer.pad_token = tokenizer.eos_token
When using the model to generate an output, pass the following as a parameter to model.generate:
pad_token_id=tokenizer.pad_token_id

Thank you user Shital Shah on StackOverflow:

https://stackoverflow.com/questions/74682597/fine-tuning-gpt2-attention-mask-and-pad-token-id-errors/76549607#76549607

Sunday, October 15, 2023

"The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results."

No comments: