Ollama
Ollama runs open-weight models on your own machine. Supercompat points an OpenAI client at the Ollama server's OpenAI-compatible endpoint and layers on the tool-calling and computer-use normalization the chat-completions surface needs.
Install
npm install supercompat openai
Make sure Ollama is installed and running, and that you've pulled the model you want:
ollama serve
ollama pull gemma4:e4b
Minimal setup
import OpenAI from 'openai'
import {
supercompat,
ollamaClientAdapter,
completionsRunAdapter,
memoryStorageAdapter,
} from 'supercompat/openai'
const ollama = new OpenAI({
apiKey: 'ollama',
baseURL: 'http://localhost:11434/v1',
})
const client = supercompat({
clientAdapter: ollamaClientAdapter({ ollama }),
storageAdapter: memoryStorageAdapter(),
runAdapter: completionsRunAdapter(),
})
const response = await client.responses.create({
model: 'gemma4:e4b',
input: 'Say hello.',
})
Remote Ollama
Change the baseURL to reach an Ollama host on another machine:
const ollama = new OpenAI({
apiKey: 'ollama',
baseURL: 'http://gpu-box.local:11434/v1',
})
Tool calling
Ollama's OpenAI-compatible endpoint supports function-calling on every model that's tool-trained upstream (Llama 3.1+, Qwen 2.5+, Gemma 4, Mistral Nemo, GPT-OSS, and others). Declare tools like you would on OpenAI:
const assistant = await client.beta.assistants.create({
model: 'gemma4:e4b',
instructions: 'You can only look up information by calling functions.',
tools: [
{
type: 'function',
function: {
name: 'get_account_balance',
description: 'Look up the account balance for a given user ID.',
parameters: {
type: 'object',
properties: { user_id: { type: 'string' } },
required: ['user_id'],
},
},
},
],
})
Computer use
Supercompat will drive Gemma 4 (and other Ollama vision models) through the same computer_use_preview tool shape as every other provider. The adapter:
Rewrites computer_use_preview into a plain computer_call function tool the model can actually call.
Denormalizes coordinates per-model — Gemma 4 emits 0-1000 normalized coordinates; the adapter rescales to real pixels.
Fuzzy-extracts box_2d bounding boxes as clicks at the box center, which Gemma 4 occasionally returns instead of a computer_call.
Relays screenshots through a synthetic user message, because Ollama silently drops images from role: "tool" messages at both the OpenAI-compat and native /api/chat layer (see caveat below).
Caveat — images in tool results
Ollama (both /v1/chat/completions and native /api/chat) does not pass images on tool-role messages to the model. If you send a tool result with image_url content parts or an images: [...] field, the image never reaches the model and you'll get hallucinated responses.
ollamaClientAdapter works around this for computer-use screenshots automatically: it relays the tool output's image through a synthetic user message immediately after the tool-result message, preserving the tool_call_id pairing. If you're hand-rolling multimodal tool flows outside computer use, you'll need to apply the same pattern yourself.
Models
Some current examples:
gemma4:e4b / gemma4:26b / gemma4:31b — multimodal, bounding-box grounded; recommended for computer use. Pick :e4b for fast iteration, :26b/:31b for quality.
qwen2.5vl:32b — multimodal