vllm.entrypoints.openai.api_server ¶
build_and_serve async ¶
build_and_serve(
engine_client: EngineClient,
listen_address: str,
sock: socket,
args: Namespace,
**uvicorn_kwargs,
) -> Task
Build FastAPI app, initialize state, and start serving.
Returns the shutdown task for the caller to await.
Source code in vllm/entrypoints/openai/api_server.py
build_and_serve_renderer async ¶
build_and_serve_renderer(
vllm_config: VllmConfig,
listen_address: str,
sock: socket,
args: Namespace,
**uvicorn_kwargs,
) -> Task
Build FastAPI app for a CPU-only render server, initialize state, and start serving.
Returns the shutdown task for the caller to await.
Source code in vllm/entrypoints/openai/api_server.py
build_async_engine_client_from_engine_args async ¶
build_async_engine_client_from_engine_args(
engine_args: AsyncEngineArgs,
*,
usage_context: UsageContext = OPENAI_API_SERVER,
disable_frontend_multiprocessing: bool = False,
client_config: dict[str, Any] | None = None,
) -> AsyncIterator[EngineClient]
Create EngineClient, either: - in-process using the AsyncLLMEngine Directly - multiprocess using AsyncLLMEngine RPC
Returns the Client or None if the creation failed.
Source code in vllm/entrypoints/openai/api_server.py
init_render_app_state async ¶
init_render_app_state(
vllm_config: VllmConfig, state: State, args: Namespace
) -> None
Initialise FastAPI app state for a CPU-only render server.
Unlike :func:init_app_state this function does not require an :class:~vllm.engine.protocol.EngineClient; it bootstraps the preprocessing pipeline (renderer, io_processor, input_processor) directly from the :class:~vllm.config.VllmConfig.
Source code in vllm/entrypoints/openai/api_server.py
run_server async ¶
Run a single-worker API server.
Source code in vllm/entrypoints/openai/api_server.py
run_server_worker async ¶
Run a single API server worker.
Source code in vllm/entrypoints/openai/api_server.py
setup_server ¶
Validate API server args, set up signal handler, create socket ready to serve.