vllm.beam_search ¶
BeamSearchOutput dataclass ¶
The output of beam search. It contains the list of the best beam search sequences. The length of the list is equal to the beam width.
Source code in vllm/beam_search.py
BeamSearchSequence dataclass ¶
A sequence for beam search. It keeps track of the tokens and the log probability of the sequence. The text field is optional and will only be filled when the sequence is about to be returned to the user.
Source code in vllm/beam_search.py
_build_encoder_decoder_inputs ¶
_build_encoder_decoder_inputs(
prompt: EncoderDecoderInputs,
) -> EncoderDecoderInputs
Rebuild the encoder-decoder inputs with the current beam search sequence's tokens.
FIXME (alex) - the encoder multimodal cache is not properly wired up yet, which means that currently we are running the encoder on every new beam because num_computed_tokens is 0 on each new request. This will be fixed once the cache is correctly implemented.
Source code in vllm/beam_search.py
get_beam_search_score ¶
get_beam_search_score(
tokens: list[int],
cumulative_logprob: float,
eos_token_id: int,
length_penalty: float = 1.0,
) -> float
Calculate the beam search score with length penalty.
Adapted from
https://github.com/huggingface/transformers/blob/ccb92be23def445f2afdea94c31286f84b89eb5b/src/transformers/generation/beam_search.py#L938