vllm.entrypoints.openai.server_utils ¶
AuthenticationMiddleware ¶
Pure ASGI middleware that authenticates each request by checking if the Authorization Bearer token exists and equals anyof "{api_key}".
Notes¶
There are two cases in which authentication is skipped: 1. The HTTP method is OPTIONS. 2. The request path doesn't start with /v1 (e.g. /health).
Source code in vllm/entrypoints/openai/server_utils.py
SSEDecoder ¶
Robust Server-Sent Events decoder for streaming responses.
Source code in vllm/entrypoints/openai/server_utils.py
decode_chunk ¶
Decode a chunk of SSE data and return parsed events.
Source code in vllm/entrypoints/openai/server_utils.py
extract_content ¶
XRequestIdMiddleware ¶
Middleware the set's the X-Request-Id header for each response to a random uuid4 (hex) value if the header isn't already present in the request, otherwise use the provided request id.
Source code in vllm/entrypoints/openai/server_utils.py
_extract_content_from_chunk ¶
Extract content from a streaming response chunk.
Source code in vllm/entrypoints/openai/server_utils.py
_log_non_streaming_response ¶
_log_non_streaming_response(response_body: list) -> None
Log non-streaming response.
Source code in vllm/entrypoints/openai/server_utils.py
_log_streaming_response ¶
_log_streaming_response(
response, response_body: list
) -> None
Log streaming response with robust SSE parsing.
Source code in vllm/entrypoints/openai/server_utils.py
engine_error_handler async ¶
engine_error_handler(
req: Request, exc: EngineDeadError | EngineGenerateError
)
VLLM V1 AsyncLLM catches exceptions and returns only two types: EngineGenerateError and EngineDeadError.
EngineGenerateError is raised by the per request generate() method. This error could be request specific (and therefore recoverable - e.g. if there is an error in input processing).
EngineDeadError is raised by the background output_handler method. This error is global and therefore not recoverable.
We register these @app.exception_handlers to return nice responses to the end user if they occur and shut down if needed. See https://fastapi.tiangolo.com/tutorial/handling-errors/ for more details on how exception handlers work.
If an exception is encountered in a StreamingResponse generator, the exception is not raised, since we already sent a 200 status. Rather, we send an error message as the next chunk. Since the exception is not raised, this means that the server will not automatically shut down. Instead, we use the watchdog background task for check for errored state.
Source code in vllm/entrypoints/openai/server_utils.py
get_uvicorn_log_config ¶
Get the uvicorn log config based on the provided arguments.
Priority: 1. If log_config_file is specified, use it 2. If disable_access_log_for_endpoints is specified, create a config with the access log filter 3. Otherwise, return None (use uvicorn defaults)