Ollama

Ollama runs open-weight models on your own machine. Supercompat points an OpenAI client at the Ollama server's OpenAI-compatible endpoint and layers on the tool-calling and computer-use normalization the chat-completions surface needs.

Install

npm install supercompat openai

Make sure Ollama is installed and running, and that you've pulled the model you want:

ollama serve ollama pull gemma4:e4b

Minimal setup

import OpenAI from 'openai' import { supercompat, ollamaClientAdapter, completionsRunAdapter, memoryStorageAdapter, } from 'supercompat/openai' const ollama = new OpenAI({ apiKey: 'ollama', // Ollama accepts any non-empty value. baseURL: 'http://localhost:11434/v1', }) const client = supercompat({ clientAdapter: ollamaClientAdapter({ ollama }), storageAdapter: memoryStorageAdapter(), runAdapter: completionsRunAdapter(), }) const response = await client.responses.create({ model: 'gemma4:e4b', input: 'Say hello.', })

Remote Ollama

Change the baseURL to reach an Ollama host on another machine:

const ollama = new OpenAI({ apiKey: 'ollama', baseURL: 'http://gpu-box.local:11434/v1', })

Tool calling

Ollama's OpenAI-compatible endpoint supports function-calling on every model that's tool-trained upstream (Llama 3.1+, Qwen 2.5+, Gemma 4, Mistral Nemo, GPT-OSS, and others). Declare tools like you would on OpenAI:

const assistant = await client.beta.assistants.create({ model: 'gemma4:e4b', instructions: 'You can only look up information by calling functions.', tools: [ { type: 'function', function: { name: 'get_account_balance', description: 'Look up the account balance for a given user ID.', parameters: { type: 'object', properties: { user_id: { type: 'string' } }, required: ['user_id'], }, }, }, ], })

Computer use

Supercompat will drive Gemma 4 (and other Ollama vision models) through the same computer_use_preview tool shape as every other provider. The adapter:

Rewrites computer_use_preview into a plain computer_call function tool the model can actually call.

Denormalizes coordinates per-model — Gemma 4 emits 0-1000 normalized coordinates; the adapter rescales to real pixels.

Fuzzy-extracts box_2d bounding boxes as clicks at the box center, which Gemma 4 occasionally returns instead of a computer_call.

Relays screenshots through a synthetic user message, because Ollama silently drops images from role: "tool" messages at both the OpenAI-compat and native /api/chat layer (see caveat below).

See Computer use → Ollama for the full run-loop example.

Caveat — images in tool results

Ollama (both /v1/chat/completions and native /api/chat) does not pass images on tool-role messages to the model. If you send a tool result with image_url content parts or an images: [...] field, the image never reaches the model and you'll get hallucinated responses.

ollamaClientAdapter works around this for computer-use screenshots automatically: it relays the tool output's image through a synthetic user message immediately after the tool-result message, preserving the tool_call_id pairing. If you're hand-rolling multimodal tool flows outside computer use, you'll need to apply the same pattern yourself.

Models

Any model you've pulled with ollama pull. Browse the library at ollama.com/library.

Some current examples:

gemma4:e4b / gemma4:26b / gemma4:31b — multimodal, bounding-box grounded; recommended for computer use. Pick :e4b for fast iteration, :26b/:31b for quality.

llama3.1:70b

qwen2.5vl:32b — multimodal

mistral-nemo

gpt-oss:20b