Ollama
| Metadata | Value |
|---|---|
| Category | ai |
| Capabilities | http, shell |
| Website | https://ollama.com |
Returns shapes
Section titled “Returns shapes”loaded_model[]— frompsmodel[]— fromlist_models,list_models_cli
Connections
Section titled “Connections”api— Ollama REST API — fast inference path, requires server to be runningcli— Ollama CLI — can start the server, pull/delete models, management ops
Readme
Section titled “Readme”Run local AI models — inference, tool calling, and model management — entirely on your machine. No API keys, no cloud, no cost per token.
Install Ollama (if not already installed):
brew install ollamabrew services start ollamaPull a model to use:
ollama pull qwen3.5:9b-q8_0 # recommended: 11GB, tools + vision + thinking, 256K ctxollama pull glm-4.7-flash # coding specialist: 19GB, best local SWE-benchThe skill auto-starts the Ollama server if it is not running — no manual setup required.
Connections
Section titled “Connections”| Connection | What it does |
|---|---|
api (default) | REST API at localhost:11434 — fast inference, most operations |
cli | Ollama CLI binary — server start/stop, model pulls, management |
| Tool | Connection | Description |
|---|---|---|
status | cli | Check if server is running; start it if not |
chat | api / cli | Multi-turn chat with tool calling and thinking mode |
generate | api | One-shot text generation (faster for simple prompts) |
list_models | api / cli | List all downloaded models with size and metadata |
pull_model | cli / api | Download a model from the Ollama registry |
delete_model | api / cli | Delete a model to free disk space |
ps | api | Show models currently loaded in memory |
show_model | api | Show model details: arch, context length, template |
Tool calling
Section titled “Tool calling”chat normalizes Ollama’s tool call format to the canonical AgentOS shape:
{ id, name, input }Pass tools as { name, description, input_schema } — the same format used by the Anthropic skill.
Thinking mode
Section titled “Thinking mode”Models that support extended reasoning (qwen3, glm-4.7-flash, etc.) can be activated with thinking: true. The reasoning trace is returned in the thinking field, separate from content.
chatwithconnection: clicollapses message history into a single prompt — suitable for single-turn onlypull_modeldefaults to the CLI connection for reliable progress on large downloads (19GB models)psshows what is currently warm in unified memory — useful before running a new model to estimate reload time- The
generateoperation skips chat overhead — use it for classification, extraction, or quick single-turn tasks where speed matters