LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.
Abstract: The fan-out package with many distinct advantages is widely adopted for the state-of-the-art mobile applications and has great potential in the server applications. In this paper, the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results