Run Local AI with Ollama

import { Aside, Steps } from ‘@astrojs/starlight/components’;

Ollama lets you run large language models (LLMs) locally on your Mac, Linux, or Windows machine.
When configured in Teachback, all AI question-generation happens on your own hardware — no internet connection required.

Prerequisites

A Mac (Apple Silicon recommended) or Linux/Windows machine on the same local network as your iPhone/iPad.
Ollama installed and running.
At least one model pulled (e.g. llama3.2, qwen2.5, mistral).

Step 1: Install and start Ollama

# macOS (Homebrew)
brew install ollama

# Or download the installer from https://ollama.ai/download

Start the Ollama server:

ollama serve

Ollama listens on http://localhost:11434 by default.

Step 2: Pull a model

# Recommended: fast and capable
ollama pull llama3.2

# Alternative: reasoning-focused
ollama pull qwen2.5:7b

# Smaller, for low-memory machines
ollama pull mistral:7b

Verify it works:

curl http://localhost:11434/api/chat \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'

Step 3: Make Ollama accessible over your LAN

By default, Ollama only listens on localhost — your iPhone can’t reach it directly.
You need to bind it to your LAN IP address.

macOS (launchctl)

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
# Then restart Ollama

Direct environment variable

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Find your Mac’s LAN IP:

ipconfig getifaddr en0      # Wi-Fi
# or
ifconfig | grep "inet 192"

Step 4: Configure Teachback

Open the app and go to Settings.
Tap API Keys & Endpoints.
Scroll to the Ollama section.
Set Base URL to your Mac’s LAN IP, e.g. http://192.168.1.100:11434
Set Model to the model you pulled, e.g. llama3.2
Tap Save.

Step 5: Run a session

Teachback automatically uses the Ollama path when useCloudWorker is disabled (i.e., your app is built without a TEACHBACK_WORKER_URL dart-define).
The dev build of the app always uses Ollama.

Start a new session — AI questions will be generated by your local model.

Recommended models

Model	VRAM	Best for
`llama3.2:3b`	~2 GB	Low-power Macs, quick responses
`llama3.2`	~4 GB	Balanced (default suggestion)
`qwen2.5:7b`	~5 GB	Strong reasoning
`qwen3.5:9b`	~6 GB	Advanced analysis
`mistral:7b`	~5 GB	Long context

Apple Silicon Macs (M1/M2/M3/M4) run 7B parameter models smoothly using unified metal GPU memory.

Troubleshooting

Problem	Fix
`Connection refused`	Ollama isn’t running or bound to the wrong host
Timeout / very slow	Model too large for your hardware; try `llama3.2:3b`
Empty response	Check `ollama serve` logs for model loading errors
iOS HTTP transport error	Make sure the iOS `NSAppTransportSecurity` exception is in `Info.plist` for your LAN IP