ValueError: All stacks are empty, so the only token accepted is EOS(2), but got 0 #37

Saibo-creator · 2024-04-19T15:26:54Z

Reproduce

import transformers
import transformers_cfg
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
if __name__ == "__main__":
    print('transformers version', transformers.__version__)
    print('transformers_cfg version, ', transformers_cfg)
    # Load model and tokenizer
    llama_tokenizer = AutoTokenizer.from_pretrained("saibo/llama-1B")
    llama_tokenizer.pad_token = llama_tokenizer.eos_token
    llama_model = AutoModelForCausalLM.from_pretrained("saibo/llama-1B")
    # Load json grammar
    with open("examples/grammars/json.ebnf", "r") as file:
        grammar_str = file.read()
    grammar = IncrementalGrammarConstraint(grammar_str, "root", llama_tokenizer)
    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
    # Generate
    prefix1 = "This is a valid json string for http request:"
    prefix2 = "This is a valid json string for shopping cart:"
    input_ids = llama_tokenizer([prefix1, prefix2], add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"]
    output = llama_model.generate(
        input_ids,
        do_sample=False,
        max_length=50,
        num_beams=1,
        logits_processor=[grammar_processor],
        repetition_penalty=1.0,
        num_return_sequences=1,
    )
    # decode output
    generations = llama_tokenizer.batch_decode(output, skip_special_tokens=True)
    print(generations)

Context

saibo/llama-1B is a randomly initialized model for debugging purpose. Though it is not a trained LLM, it should be forced to generate some structure but it is failing.

The text was updated successfully, but these errors were encountered:

Saibo-creator · 2024-04-25T12:18:22Z

The problem seems to stem from the default padding configuration of the Llama Tokenizer, which is set to "left" padding instead of the more common "right" padding used by most large language model (LLM) tokenizers.

from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.padding_side
#'right'

A straightforward solution is to adjust the padding side of the llama tokenizer by adding the line llama_tokenizer.padding_side = "right".

However, it's not yet clear which specific part of the code is affected by this setting. I plan to delve into this further. For now, the aforementioned fix is effective, and this issue seems to only impact llama models.

Note: The LLAMA-3 model already defaults the padding side to "right".

Saibo-creator · 2024-04-28T12:11:04Z

But it seems that left padding is the right way to go, otherwise we lose performance.
https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side

We can discuss this further.
@Yuxing0610

xkasberg · 2024-06-03T20:36:00Z

I am also running into this issue.

The Padding side on the Left is the right way to go for Batch processing of inputs ( sending multiple sequences at a time), as each input needs to be the same length. If we pad on the right, we will have a number of <|eot_id|> tokens following the assistant message, and the model will not generate anything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: All stacks are empty, so the only token accepted is EOS(2), but got 0 #37

ValueError: All stacks are empty, so the only token accepted is EOS(2), but got 0 #37

Saibo-creator commented Apr 19, 2024

Saibo-creator commented Apr 25, 2024

Saibo-creator commented Apr 28, 2024

xkasberg commented Jun 3, 2024

ValueError: All stacks are empty, so the only token accepted is EOS(2), but got 0 #37

ValueError: All stacks are empty, so the only token accepted is EOS(2), but got 0 #37

Comments

Saibo-creator commented Apr 19, 2024

Reproduce

Context

Saibo-creator commented Apr 25, 2024

Saibo-creator commented Apr 28, 2024

xkasberg commented Jun 3, 2024