Run Local AI with Ollama
import { Aside, Steps } from ‘@astrojs/starlight/components’;
Ollama lets you run large language models (LLMs) locally on your Mac, Linux, or Windows machine.
When configured in Teachback, all AI question-generation happens on your own hardware — no internet connection required.
Prerequisites
Section titled “Prerequisites”- A Mac (Apple Silicon recommended) or Linux/Windows machine on the same local network as your iPhone/iPad.
- Ollama installed and running.
- At least one model pulled (e.g.
llama3.2,qwen2.5,mistral).
Step 1: Install and start Ollama
Section titled “Step 1: Install and start Ollama”# macOS (Homebrew)brew install ollama
# Or download the installer from https://ollama.ai/downloadStart the Ollama server:
ollama serveOllama listens on http://localhost:11434 by default.
Step 2: Pull a model
Section titled “Step 2: Pull a model”# Recommended: fast and capableollama pull llama3.2
# Alternative: reasoning-focusedollama pull qwen2.5:7b
# Smaller, for low-memory machinesollama pull mistral:7bVerify it works:
curl http://localhost:11434/api/chat \ -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'Step 3: Make Ollama accessible over your LAN
Section titled “Step 3: Make Ollama accessible over your LAN”By default, Ollama only listens on localhost — your iPhone can’t reach it directly.
You need to bind it to your LAN IP address.
macOS (launchctl)
Section titled “macOS (launchctl)”launchctl setenv OLLAMA_HOST "0.0.0.0:11434"# Then restart OllamaDirect environment variable
Section titled “Direct environment variable”OLLAMA_HOST=0.0.0.0:11434 ollama serveFind your Mac’s LAN IP:
ipconfig getifaddr en0 # Wi-Fi# orifconfig | grep "inet 192"Step 4: Configure Teachback
Section titled “Step 4: Configure Teachback”- Open the app and go to Settings.
- Tap API Keys & Endpoints.
- Scroll to the Ollama section.
- Set Base URL to your Mac’s LAN IP, e.g.
http://192.168.1.100:11434 - Set Model to the model you pulled, e.g.
llama3.2 - Tap Save.
Step 5: Run a session
Section titled “Step 5: Run a session”Teachback automatically uses the Ollama path when useCloudWorker is disabled (i.e., your app is built without a TEACHBACK_WORKER_URL dart-define).
The dev build of the app always uses Ollama.
Start a new session — AI questions will be generated by your local model.
Recommended models
Section titled “Recommended models”| Model | VRAM | Best for |
|---|---|---|
llama3.2:3b | ~2 GB | Low-power Macs, quick responses |
llama3.2 | ~4 GB | Balanced (default suggestion) |
qwen2.5:7b | ~5 GB | Strong reasoning |
qwen3.5:9b | ~6 GB | Advanced analysis |
mistral:7b | ~5 GB | Long context |
Apple Silicon Macs (M1/M2/M3/M4) run 7B parameter models smoothly using unified metal GPU memory.
Troubleshooting
Section titled “Troubleshooting”| Problem | Fix |
|---|---|
Connection refused | Ollama isn’t running or bound to the wrong host |
| Timeout / very slow | Model too large for your hardware; try llama3.2:3b |
| Empty response | Check ollama serve logs for model loading errors |
| iOS HTTP transport error | Make sure the iOS NSAppTransportSecurity exception is in Info.plist for your LAN IP |