LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.
Abstract: The fan-out package with many distinct advantages is widely adopted for the state-of-the-art mobile applications and has great potential in the server applications. In this paper, the ...