Blog/Running AI Locally: A Guide to Private LLM Analysis with Ollama
Tushar Laad··5 min readtutorialollama

Running AI Locally with Ollama

Why Local AI?

Cloud AI is convenient, but for personal data analysis, privacy matters. When you're extracting beliefs, tracking contradictions, and generating narratives from your most personal writing, you want that processing to happen on your machine.

Ollama makes this practical. It runs open-source LLMs locally with GPU acceleration — no internet required.

Setup (2 Minutes)

  1. Download Ollama from ollama.com
  2. Install and run it (it starts a local server at localhost:11434)
  3. Pull two models:
# Embedding model (required for search, 274 MB)

LLM model (pick by your VRAM) ollama pull qwen2.5:14b-instruct-q4_K_M ```

Which Model For Your Hardware?

VRAMModelQualitySpeed
4 GBllama3.2:3bGood for basic extraction~40 tok/s
8 GBllama3.1:8bSolid all-around~35 tok/s
12 GBqwen2.5:14b-instruct-q4_K_MBest quality/speed balance~25 tok/s
16 GB+qwen2.5:32b-instruct-q4_K_MNear-cloud quality~15 tok/s

Recommendation: If you have 12 GB VRAM (RTX 4070/5070 Ti), the 14B Qwen model is the sweet spot. It's significantly better than 8B models at structured JSON extraction and nuanced belief analysis.

Embedding Model: Always Use nomic-embed-text

For search to work, you need an embedding model. nomic-embed-text is: - Only 274 MB - 768 dimensions (efficient) - Fast enough for real-time search - Runs on any hardware

Hybrid Setup: Cloud LLM + Local Embeddings

The best of both worlds: use a free cloud LLM (Gemini Flash, Groq) for analysis + local Ollama for embeddings. This way: - Analysis uses a powerful cloud model (free) - Your search index stays fully private (local) - No VRAM needed for the large LLM

In MemryLab: Settings > Embedding Provider > Select "Ollama (local, private)"

Cost Comparison

SetupMonthly CostPrivacySpeed
Full local (Ollama)$0 + electricityFull15-40 tok/s
Gemini Flash (cloud)~$0 (free tier)Analysis only100+ tok/s
GPT-4o (cloud)~$5-20None60+ tok/s

For MemryLab's typical usage (~100 LLM calls per analysis), even cloud costs are negligible. But local gives you zero dependency on external services.

Download MemryLab | View All Providers

Ready to explore your data?

Download MemryLab — free, open source, privacy-first.

Download MemryLab