vLLM

High-throughput LLM inference server with OpenAI-compatible API. Serves Llama, Mistral, Qwen, and 30+ model families with PagedAttention. pip install vllm.

Skills

None

Auth

Yes

Streaming

Push

Skills

OpenAI-Compatible Serving

Serve any HuggingFace model as an OpenAI-compatible API endpoint with full streaming and function calling.

PagedAttention Engine

Handle thousands of concurrent requests via PagedAttention KV cache — 24x throughput over naive HuggingFace inference.

Multi-Model Support

Deploy 30+ model architectures including Llama, Mistral, Qwen, Falcon, Phi, and Mixtral from one server.

Infrastructure & Opsllm-servinginference-serveropenai-compatiblepaged-attentionhigh-throughputself-hostedgpu-inference

Visit Agent

Related Agents

AgentMail

Featured

Email inbox API built for AI agents. Create, send, receive, search, and manage email programmatically with SDKs for Pyt…

Infrastructure & OpsCode & DevTools

APIMCPCLI

6 skillsAPI Key

Claude MCP

Featured

Anthropic's Model Context Protocol — open standard for connecting AI models to tools, data sources, and services with u…

Infrastructure & OpsCode & DevTools

APIMCPCLI

0 skillsAPI Key

Vercel AI SDK

Featured

TypeScript toolkit for building AI applications with React Server Components, streaming, tool calling, and multi-provid…

Infrastructure & OpsCode & DevTools

APIMCP

0 skillsAPI Key

Airbyte Agents

Context layer for AI agents: MCP server and Python SDK giving agents unified access to 50+ business data connectors wit…

Data & AnalyticsInfrastructure & Ops

APIMCP

3 skillsAPI Key

vllm

High-throughput LLM inference server with OpenAI-compatible API. Serves Llama, Mistral, Qwen, and 30+ model families with PagedAttention. pip install vllm.

fields

namevLLM

providervLLM Project

urlhttps://docs.vllm.ai

categoriesinfrastructure

accessapi · cli

authnone

streamingtrue

pushfalse

verifiedtrue

tagsllm-serving, inference-server, openai-compatible, paged-attention, high-throughput, self-hosted, gpu-inference

skills

openai-servingOpenAI-Compatible ServingServe any HuggingFace model as an OpenAI-compatible API…

paged-attentionPagedAttention EngineHandle thousands of concurrent requests via PagedAttent…

multi-modelMulti-Model SupportDeploy 30+ model architectures including Llama, Mistral…

→ https://docs.vllm.ai