Skip to content

Run Local AI with Ollama

import { Aside, Steps } from ‘@astrojs/starlight/components’;

Ollama lets you run large language models (LLMs) locally on your Mac, Linux, or Windows machine.
When configured in Teachback, all AI question-generation happens on your own hardware — no internet connection required.


  • A Mac (Apple Silicon recommended) or Linux/Windows machine on the same local network as your iPhone/iPad.
  • Ollama installed and running.
  • At least one model pulled (e.g. llama3.2, qwen2.5, mistral).

Terminal window
# macOS (Homebrew)
brew install ollama
# Or download the installer from https://ollama.ai/download

Start the Ollama server:

Terminal window
ollama serve

Ollama listens on http://localhost:11434 by default.


Terminal window
# Recommended: fast and capable
ollama pull llama3.2
# Alternative: reasoning-focused
ollama pull qwen2.5:7b
# Smaller, for low-memory machines
ollama pull mistral:7b

Verify it works:

Terminal window
curl http://localhost:11434/api/chat \
-d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'

Step 3: Make Ollama accessible over your LAN

Section titled “Step 3: Make Ollama accessible over your LAN”

By default, Ollama only listens on localhost — your iPhone can’t reach it directly.
You need to bind it to your LAN IP address.

Terminal window
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
# Then restart Ollama
Terminal window
OLLAMA_HOST=0.0.0.0:11434 ollama serve

Find your Mac’s LAN IP:

Terminal window
ipconfig getifaddr en0 # Wi-Fi
# or
ifconfig | grep "inet 192"

  1. Open the app and go to Settings.
  2. Tap API Keys & Endpoints.
  3. Scroll to the Ollama section.
  4. Set Base URL to your Mac’s LAN IP, e.g. http://192.168.1.100:11434
  5. Set Model to the model you pulled, e.g. llama3.2
  6. Tap Save.

Teachback automatically uses the Ollama path when useCloudWorker is disabled (i.e., your app is built without a TEACHBACK_WORKER_URL dart-define).
The dev build of the app always uses Ollama.

Start a new session — AI questions will be generated by your local model.


ModelVRAMBest for
llama3.2:3b~2 GBLow-power Macs, quick responses
llama3.2~4 GBBalanced (default suggestion)
qwen2.5:7b~5 GBStrong reasoning
qwen3.5:9b~6 GBAdvanced analysis
mistral:7b~5 GBLong context

Apple Silicon Macs (M1/M2/M3/M4) run 7B parameter models smoothly using unified metal GPU memory.


ProblemFix
Connection refusedOllama isn’t running or bound to the wrong host
Timeout / very slowModel too large for your hardware; try llama3.2:3b
Empty responseCheck ollama serve logs for model loading errors
iOS HTTP transport errorMake sure the iOS NSAppTransportSecurity exception is in Info.plist for your LAN IP