vllm.multimodal.media.audio ¶
AudioEmbeddingMediaIO ¶
Configuration values can be user-provided either by --media-io-kwargs or by the runtime API field "media_io_kwargs". Ensure proper validation and error handling.
Source code in vllm/multimodal/media/audio.py
AudioMediaIO ¶
Bases: MediaIO[tuple[NDArray, float]]
Configuration values can be user-provided either by --media-io-kwargs or by the runtime API field "media_io_kwargs". Ensure proper validation and error handling.
Source code in vllm/multimodal/media/audio.py
extract_audio_from_video_bytes ¶
Extract the audio track from raw video bytes using PyAV.
PyAV wraps FFmpeg's C libraries in-process — no subprocess is spawned, which is critical to avoid crashing CUDA-active vLLM worker processes.
The returned waveform is at the native sample rate of the video's audio stream. Resampling to a model-specific rate is left to the downstream :class:AudioResampler in the parsing pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | bytes | Raw video file bytes (e.g. from an mp4 file). | required |
Returns:
| Type | Description |
|---|---|
NDArray | A tuple of |
float | class: |
Source code in vllm/multimodal/media/audio.py
is_video ¶
Check if the fetched bytes are video