Computer use

A computer-use agent lets the model decide what to do on a screen — click, type, scroll, take a screenshot — and hand each action back to your application to execute. Supercompat normalizes this into a single tool shape and a single agent loop; the run/client adapters translate to whatever the provider expects on the wire.

The end-to-end computer-use tests are flaky by nature — they depend on a live browser environment and the model actually being willing to act. Expect occasional retries in CI, but the setup patterns below are exactly what the tests use.

OpenAI

Pair openaiClientAdapter with openaiResponsesRunAdapter:

import OpenAI from 'openai' import { supercompat, openaiClientAdapter, openaiResponsesRunAdapter, memoryStorageAdapter, } from 'supercompat/openai' const client = supercompat({ clientAdapter: openaiClientAdapter({ openai: new OpenAI() }), storageAdapter: memoryStorageAdapter(), runAdapter: openaiResponsesRunAdapter(), })

Declare the tool

Use the computer type. Supercompat forwards it to OpenAI as computer_use_preview.

await client.responses.create({ model: 'computer-use-preview', input: 'Search for "supercompat" on Google.', tools: [{ type: 'computer' }], truncation: 'auto', })

For a concrete environment, supply dimensions and platform:

await client.responses.create({ model: 'computer-use-preview', input: 'Search for "supercompat" on Google.', tools: [ { type: 'computer', computer: { display_width: 1280, display_height: 720, environment: 'mac', }, }, ], truncation: 'auto', })

Fields:

display_width (default 1280)

display_height (default 720)

environment — 'mac' | 'windows' | 'linux'

Azure OpenAI

Same declaration against an Azure OpenAI deployment that exposes computer-use:

import { AzureOpenAI } from 'openai' import { supercompat, azureOpenaiClientAdapter, azureResponsesRunAdapter, memoryStorageAdapter, } from 'supercompat/openai' const azureOpenai = new AzureOpenAI({ endpoint: process.env.AZURE_OPENAI_ENDPOINT!, apiKey: process.env.AZURE_OPENAI_API_KEY, apiVersion: '2024-10-21', }) const client = supercompat({ clientAdapter: azureOpenaiClientAdapter({ azureOpenai }), storageAdapter: memoryStorageAdapter(), runAdapter: azureResponsesRunAdapter(), }) await client.responses.create({ model: 'my-computer-use-deployment', input: 'Open a browser and search for "supercompat".', tools: [{ type: 'computer' }], truncation: 'auto', })

Run loop

The agent emits a computer_call item with an action describing what to do next. Perform the action, take a screenshot, and return it as a computer_call_output. Repeat until the model stops requesting actions.

let response = await client.responses.create({ model: 'computer-use-preview', input: initialPrompt, tools: [ { type: 'computer', computer: { display_width: 1280, display_height: 720, environment: 'mac' }, }, ], truncation: 'auto', }) while (true) { const call = response.output.find((item) => item.type === 'computer_call') if (!call) break // call.action is a discriminated union — { type: 'click', x, y }, // { type: 'type', text }, { type: 'screenshot' }, etc. await executeOnVM(call.action) const screenshot = await captureScreenshot() response = await client.responses.create({ model: 'computer-use-preview', previous_response_id: response.id, input: [ { type: 'computer_call_output', call_id: call.call_id, output: { type: 'computer_screenshot', image_url: screenshot, }, }, ], truncation: 'auto', }) }

Your executeOnVM function maps each action type to a real input event on the environment you control — a virtual machine, a browser instance, or any sandbox with mouse and keyboard APIs.

Anthropic

On the Assistants surface, anthropicClientAdapter + completionsRunAdapter forward Anthropic's native computer_20250124 (or computer_20251124) tool and normalize the resulting computer_use blocks into OpenAI-compatible computer_call items. The adapter automatically attaches the matching computer-use-* beta header based on the tool type you declare.

import Anthropic from '@anthropic-ai/sdk' import { PrismaClient } from '@prisma/client' import { supercompat, anthropicClientAdapter, completionsRunAdapter, prismaStorageAdapter, } from 'supercompat/openai' const client = supercompat({ clientAdapter: anthropicClientAdapter({ anthropic: new Anthropic() }), storageAdapter: prismaStorageAdapter({ prisma: new PrismaClient() }), runAdapter: completionsRunAdapter(), }) const assistant = await client.beta.assistants.create({ model: 'claude-sonnet-4-6', instructions: 'You control a browser via the computer tool. Take a screenshot before acting.', tools: [ { type: 'computer_20250124', computer_20250124: { name: 'computer', display_width_px: 1280, display_height_px: 720, }, } as any, ], })

Drive it through the standard Assistants run loop — requires_action surfaces computer_call items; execute the action on your environment and submit a computer_call_output with the resulting screenshot.

OpenRouter (Qwen, GLM, Kimi, Gemini Flash)

OpenRouter's catalog includes several vision models that can be adapted into computer-use agents. openRouterClientAdapter handles each model's quirks — GLM emits normalized coordinates (0–1000) that get denormalized to pixels; Qwen returns slightly malformed JSON that's parsed with a fuzzy extractor; Kimi-VL uses the standard pixel format.

import { OpenRouter, HTTPClient } from '@openrouter/sdk' import { PrismaClient } from '@prisma/client' import { supercompat, openRouterClientAdapter, completionsRunAdapter, prismaStorageAdapter, } from 'supercompat/openai' const httpClient = new HTTPClient({ fetcher: (request: Request) => { request.headers.set('Connection', 'close') return fetch(request) }, }) const client = supercompat({ clientAdapter: openRouterClientAdapter({ openRouter: new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY!, httpClient }), }), storageAdapter: prismaStorageAdapter({ prisma: new PrismaClient() }), runAdapter: completionsRunAdapter(), }) const assistant = await client.beta.assistants.create({ model: 'z-ai/glm-4.6v', instructions: 'You control a computer via the computer tool. Take a screenshot before acting.', tools: [ { type: 'computer_use_preview', computer_use_preview: { display_width: 1280, display_height: 720, }, }, ], })

Tested OpenRouter vision models:

z-ai/glm-4.6v

qwen/qwen-2.5-vl-72b-instruct

moonshotai/kimi-vl-a3b-thinking

google/gemini-2.5-flash

google/gemma-4-26b-a4b-it, google/gemma-4-31b-it

Ollama (local)

ollamaClientAdapter + completionsRunAdapter drive a local vision model through the same computer_use_preview shape. Gemma 4 is the best candidate today — multimodal, bounding-box grounded, and emits 0-1000 normalized coordinates that the adapter denormalizes to pixels. Expect lower reliability than hosted computer-use models; local coordinate accuracy is the weak link.

import OpenAI from 'openai' import { PrismaClient } from '@prisma/client' import { supercompat, ollamaClientAdapter, completionsRunAdapter, prismaStorageAdapter, } from 'supercompat/openai' const client = supercompat({ clientAdapter: ollamaClientAdapter({ ollama: new OpenAI({ apiKey: 'ollama', baseURL: 'http://localhost:11434/v1', }), }), storageAdapter: prismaStorageAdapter({ prisma: new PrismaClient() }), runAdapter: completionsRunAdapter(), }) const assistant = await client.beta.assistants.create({ model: 'gemma4:e4b', instructions: 'You control a computer via the computer tool. Take a screenshot before acting.', tools: [ { type: 'computer_use_preview', computer_use_preview: { display_width: 1280, display_height: 720, }, }, ], })

Ollama drops tool-role images

One Ollama-specific caveat worth calling out: both the OpenAI-compatible endpoint and the native /api/chat endpoint silently drop images attached to role: "tool" messages. The image content part is accepted without error but never reaches the model, which then hallucinates plausible-sounding output.

The adapter handles this for computer-use screenshots automatically: after a computer_call returns a screenshot, the tool-result message is kept as a short text receipt (so the tool_call_id pairing stays valid) and the image is relayed through a synthetic user message immediately after. The run loop is unchanged from your side.

Tested local models:

gemma4:e4b, gemma4:26b, gemma4:31b

Google (Gemini)

googleClientAdapter + completionsRunAdapter on the Assistants surface. Same computer_use_preview shape:

import { GoogleGenAI } from '@google/genai' import { PrismaClient } from '@prisma/client' import { supercompat, googleClientAdapter, completionsRunAdapter, prismaStorageAdapter, } from 'supercompat/openai' const client = supercompat({ clientAdapter: googleClientAdapter({ google: new GoogleGenAI() }), storageAdapter: prismaStorageAdapter({ prisma: new PrismaClient() }), runAdapter: completionsRunAdapter(), }) const assistant = await client.beta.assistants.create({ model: 'gemini-2.5-flash', instructions: 'You control a browser via the computer tool. Take a screenshot before acting.', tools: [ { type: 'computer_use_preview', computer_use_preview: { display_width: 1280, display_height: 720, }, }, ], })

Compatibility summary

Pairing	Tool shape	Tested models
`openaiClientAdapter` + `openaiResponsesRunAdapter`	`{ type: 'computer' }` (Responses API)	`computer-use-preview`
`azureOpenaiClientAdapter` + `azureResponsesRunAdapter`	`{ type: 'computer' }` (Responses API)	Azure deployment exposing computer-use
`anthropicClientAdapter` + `completionsRunAdapter`	`{ type: 'computer_20250124', computer_20250124: {...} }` (Assistants API)	`claude-sonnet-4-6`
`googleClientAdapter` + `completionsRunAdapter`	`{ type: 'computer_use_preview', computer_use_preview: {...} }` (Assistants API)	`gemini-2.5-flash`
`openRouterClientAdapter` + `completionsRunAdapter`	`{ type: 'computer_use_preview', computer_use_preview: {...} }` (Assistants API)	GLM, Qwen-VL, Kimi-VL, Gemini Flash, Gemma 4 via OpenRouter
`ollamaClientAdapter` + `completionsRunAdapter`	`{ type: 'computer_use_preview', computer_use_preview: {...} }` (Assistants API)	`gemma4` (local)

The Responses API path uses the shorter { type: 'computer' } shape; the Assistants API path uses the longer { type: 'computer_use_preview', computer_use_preview: {...} } shape. Supercompat translates both into whatever the provider actually expects on the wire.