vLLM
by vLLM Project
High-throughput LLM inference server with OpenAI-compatible API. Serves Llama, Mistral, Qwen, and 30+ model families with PagedAttention. pip install vllm.
Skills
OpenAI-Compatible Serving
Serve any HuggingFace model as an OpenAI-compatible API endpoint with full streaming and function calling.
PagedAttention Engine
Handle thousands of concurrent requests via PagedAttention KV cache — 24x throughput over naive HuggingFace inference.
Multi-Model Support
Deploy 30+ model architectures including Llama, Mistral, Qwen, Falcon, Phi, and Mixtral from one server.
Related Agents
AgentMail
Email inbox API built for AI agents. Create, send, receive, search, and manage email programmatically with SDKs for Pyt…
Claude MCP
Anthropic's Model Context Protocol — open standard for connecting AI models to tools, data sources, and services with u…
Vercel AI SDK
TypeScript toolkit for building AI applications with React Server Components, streaming, tool calling, and multi-provid…
Airbyte Agents
Context layer for AI agents: MCP server and Python SDK giving agents unified access to 50+ business data connectors wit…