"No next state found for the current state" error

## Bug first encountered

While working on examples for `outlines` after having upgraded the version of `outlines-core` used to 0.2.11, I run into an error when calling the `guide.advance` method.

Code used when first encountering the bug (current `main` branch of `outlines` or v1.2.0):

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import outlines

model_name = "erwanf/gpt2-mini"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = outlines.from_transformers(
    model,
    tokenizer,
)

response = model("Hello there", list[str], max_new_tokens=100)
print(response)
```

Stacktrace:

```
  File "/Users/robin/outlines/.idea/b.py", line 26, in <module>
    gen()
  File "/Users/robin/outlines/.idea/b.py", line 15, in gen
    response = model("Hello there", list[str], max_new_tokens=100)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/outlines/models/base.py", line 122, in __call__
    return Generator(self, output_type, backend)(model_input, **inference_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/outlines/generator.py", line 297, in __call__
    return self.model.generate(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/outlines/models/transformers.py", line 307, in generate
    generated_ids = self._generate_output_seq(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/outlines/models/transformers.py", line 356, in _generate_output_seq
    output_ids = self.model.generate(
                 ^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/.venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2465, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/Users/robin/outlines/.venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 3450, in _sample
    next_token_scores = logits_processor(input_ids, next_token_logits)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/.venv/lib/python3.11/site-packages/transformers/generation/logits_process.py", line 88, in __call__
    scores = processor(input_ids, scores)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/outlines/processors/base_logits_processor.py", line 123, in __call__
    processed_logits = self.process_logits(input_ids, logits)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/robin/outlines/outlines/backends/outlines_core.py", line 168, in process_logits
    self._guides[i].advance(
ValueError: No next state found for the current state: 448 with token ID: 0
```

The input ids up to the crash were: `[15496,   612,    58,  8973,   198,   198,     1,    11,     0]`

## Bug reproduction without using `outlines`

To try to understand the origin of the bug and make sure it's not caused by the implementation of `outlines-core` in `outlines`, here's another way of reproducing with some extra elements to understand the bug:

```python
from typing import Dict
from transformers import AutoTokenizer
from outlines_core import Index, Vocabulary

tokenizer = AutoTokenizer.from_pretrained("erwanf/gpt2-mini")

vocabulary = tokenizer.get_vocab()
eos_token_id = tokenizer.eos_token_id
eos_token = tokenizer.eos_token

def create_outlines_core_vocabulary(
    vocab: Dict[str, int], eos_token_id: int, eos_token: str
) -> Vocabulary:
    formatted_vocab = {}
    for token, token_id in vocab.items():
        formatted_vocab[token] = [token_id]
    formatted_vocab.pop(eos_token)
    return Vocabulary(eos_token_id, formatted_vocab)

vocabulary = create_outlines_core_vocabulary(vocabulary, eos_token_id, eos_token)

# this is the regex corresponding to the output type above
index = Index(r'\[("[^"]*")(,\ ("[^"]*"))*\]', vocabulary)

print(index.get_initial_state()) # 416
print(index.get_final_states()) # {480}

transitions = index.get_transitions() 

print(transitions[1792]) # {11: 448, 60: 480}
# 448 is the target state when generating token 11 at current state 1792 for instance

print(transitions[448])
# Traceback (most recent call last):
# File "/home/robinpicard/outlines/.idea/error_outlines_core.py", line 48, in <module>
#    print(transitions[448])
#          ~~~~~~~~~~~^^^^^
# KeyError: 448

# There should be an entry for 448 as it's a target state as shown above

# token 1 = "!"
# token 11 = '"'
# token 60 = ']'
```

Or could the issue be the creation of the Vocabulary?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"No next state found for the current state" error #224

Bug first encountered

Bug reproduction without using `outlines`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"No next state found for the current state" error #224

Description

Bug first encountered

Bug reproduction without using outlines

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug reproduction without using `outlines`