Local LLM Server with GPU and NPU Acceleration
The Lemonade SDK makes it easy to run Large Language Models (LLMs) on your PC. Our focus is using the best tools, such as neural processing units (NPUs) and Vulkan GPU acceleration, to maximize LLM speed and responsiveness.
The Lemonade SDK is comprised of the following:
lemonade
CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with measurement tools to characterize your models on your hardware. The available tools are:
Maximum LLM performance requires the right hardware accelerator with the right inference engine for your scenario. Lemonade supports the following configurations, while also making it easy to switch between them at runtime.
Hardware | 🛠️ Engine Support | 🖥️ OS (x86/x64) | |||
---|---|---|---|---|---|
OGA | llamacpp | HF | Windows | Linux | |
🧠 CPU | All platforms | All platforms | All platforms | ✅ | ✅ |
🎮 GPU | — | Vulkan: All platforms Focus: Ryzen™ AI 7000/8000/300 Radeon™ 7000/9000 | — | ✅ | ✅ |
🤖 NPU | AMD Ryzen™ AI 300 series | — | — | ✅ | — |
Engine | Description |
---|---|
OnnxRuntime GenAI (OGA) | Microsoft engine that runs .onnx models and enables hardware vendors to provide their own execution providers (EPs) to support specialized hardware, such as neural processing units (NPUs). |
llamacpp | Community-driven engine with strong GPU acceleration, support for thousands of .gguf models, and advanced features such as vision-language models (VLMs) and mixture-of-experts (MoEs). |
Hugging Face (HF) | Hugging Face's transformers library can run the original .safetensors trained weights for models on Meta's PyTorch engine, which provides a source of truth for accuracy measurement. |
Lemonade Server enables languages including Python, C++, Java, C#, Node.js, Go, Ruby, Rust, and PHP. For the full list and integration details, see docs/server/README.md.
We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.
This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue or email [email protected].
This project is licensed under the Apache 2.0 License. Portions of the project are licensed as described in NOTICE.md.
No configuration available
Related projects feature coming soon
Will recommend related projects based on sub-categories