← All models
bytedance

UI-TARS 7B

bytedance/ui-tars-1.5-7b

Ratings & Rankings

Input Cost
$0.1000
84/100
Output Cost
$0.2000
89/100
Context Window
128K
12/100
Overall
62/100
62/100

Key Specifications

Context Window128K
Max Output2K
TokenizerOther
Input Modalitiesimage, text
Output Modalitiestext

Pricing

Input (per M tokens)$0.1000per million tokens
Output (per M tokens)$0.2000per million tokens

Input & Output

You send
imagetext
You get
text

Capabilities

Function calling
Structured outputs
Unmoderated

About

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

Supported parameters

frequency_penaltylogit_biasmax_tokenspresence_penaltyrepetition_penaltyseedstoptemperaturetop_ktop_p