Running AI Locally with Ollama
Why Local AI?
Cloud AI is convenient, but for personal data analysis, privacy matters. When you're extracting beliefs, tracking contradictions, and generating narratives from your most personal writing, you want that processing to happen on your machine.
Ollama makes this practical. It runs open-source LLMs locally with GPU acceleration — no internet required.
Setup (2 Minutes)
- Download Ollama from ollama.com
- Install and run it (it starts a local server at localhost:11434)
- Pull two models:
# Embedding model (required for search, 274 MB)
LLM model (pick by your VRAM) ollama pull qwen2.5:14b-instruct-q4_K_M ```
Which Model For Your Hardware?
| VRAM | Model | Quality | Speed |
|---|---|---|---|
| 4 GB | llama3.2:3b | Good for basic extraction | ~40 tok/s |
| 8 GB | llama3.1:8b | Solid all-around | ~35 tok/s |
| 12 GB | qwen2.5:14b-instruct-q4_K_M | Best quality/speed balance | ~25 tok/s |
| 16 GB+ | qwen2.5:32b-instruct-q4_K_M | Near-cloud quality | ~15 tok/s |
Recommendation: If you have 12 GB VRAM (RTX 4070/5070 Ti), the 14B Qwen model is the sweet spot. It's significantly better than 8B models at structured JSON extraction and nuanced belief analysis.
Embedding Model: Always Use nomic-embed-text
For search to work, you need an embedding model. nomic-embed-text is:
- Only 274 MB
- 768 dimensions (efficient)
- Fast enough for real-time search
- Runs on any hardware
Hybrid Setup: Cloud LLM + Local Embeddings
The best of both worlds: use a free cloud LLM (Gemini Flash, Groq) for analysis + local Ollama for embeddings. This way: - Analysis uses a powerful cloud model (free) - Your search index stays fully private (local) - No VRAM needed for the large LLM
In MemryLab: Settings > Embedding Provider > Select "Ollama (local, private)"
Cost Comparison
| Setup | Monthly Cost | Privacy | Speed |
|---|---|---|---|
| Full local (Ollama) | $0 + electricity | Full | 15-40 tok/s |
| Gemini Flash (cloud) | ~$0 (free tier) | Analysis only | 100+ tok/s |
| GPT-4o (cloud) | ~$5-20 | None | 60+ tok/s |
For MemryLab's typical usage (~100 LLM calls per analysis), even cloud costs are negligible. But local gives you zero dependency on external services.