Toolathlon is a benchmark to assess language agents' general tool use in realistic environments. It features 600+ diverse tools based on real-world software environments. Each task requires ...
The dataset used for fine-tuning the model. Code for generating the dataset. Scripts for fine-tuning the model on high-performance GPUs. Inference scripts for real-time task execution. SG_VLM utilizes ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results