← Back to Home

Self-Hosting Ollama: A Home Lab Journey

Published on December 27, 2025

Running large language models (LLMs) locally in a home lab environment offers unique advantages beyond just cost savings. It provides an invaluable learning opportunity to understand AI infrastructure from the ground up, experiment freely without API rate limits, maintain complete privacy over your data, and gain deep technical knowledge about model deployment, resource optimization, and system architecture.

In this guide, I'll walk you through my home lab setup for running Ollama across two Lenovo ThinkCentre M720q machines, each with different GPU configurations. This dual-node setup demonstrates how commodity hardware can power a practical, educational AI infrastructure.

Why Run LLMs Locally?

Before diving into the technical details, let's explore the benefits of self-hosting:

Hardware Setup

My setup consists of two Lenovo ThinkCentre M720q tiny desktops - compact, efficient machines that pack surprising power:

Node 1 - Intel Arc Setup

Node 2 - NVIDIA Setup

Why These GPUs? Both the Intel Arc A310 and NVIDIA Quadro T1000 are low-profile cards, which is crucial for the ThinkCentre M720q's compact form factor. These tiny desktop machines have extremely limited internal space, and standard-height GPUs simply won't fit. The low-profile design allows for powerful GPU acceleration while maintaining the small footprint that makes these machines ideal for home lab deployments.

Node 1: Intel Arc A310 Setup

The Intel Arc A310 is an excellent budget GPU for AI workloads. While it has limited VRAM (4GB), it's perfect for embedding models and smaller LLMs. Ollama uses Vulkan backend for Intel Arc GPUs, which provides great performance.

Step 1: System Preparation

Start with a fully updated system:

sudo apt update
sudo apt upgrade -y

Step 2: Verify GPU Detection

Check if the Intel Arc GPU is detected:

lspci | grep -i vga

You should see output similar to:

03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A310] (rev 05)

Step 3: Install Intel GPU Drivers

Install the Intel GPU drivers and required dependencies:

sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
sudo apt update

# Install Intel GPU drivers and OpenCL support
sudo apt install -y \
  libze-intel-gpu1 \
  libze1 \
  intel-metrics-discovery \
  intel-opencl-icd \
  clinfo \
  intel-gsc

# Install media acceleration drivers
sudo apt install -y \
  intel-media-va-driver-non-free \
  libmfx-gen1 \
  libvpl2 \
  libvpl-tools \
  libva-glx2 \
  va-driver-all \
  vainfo

# Install development libraries
sudo apt install -y \
  libze-dev \
  intel-ocloc \
  libze-intel-gpu-raytracing
Note: The kobuk-team/intel-graphics PPA provides up-to-date Intel GPU drivers optimized for Ubuntu. These drivers include support for Intel Arc discrete GPUs.

Step 4: Install Ollama

Install Ollama using the official installation script:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

sudo systemctl status ollama.service

You should see output indicating the service is running:

● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
     Active: active (running) since Wed 2025-12-31 09:30:48 UTC; 41s ago
   Main PID: 1274 (ollama)
      Tasks: 9 (limit: 76930)
     Memory: 9.8M (peak: 21.1M)
        CPU: 67ms
     CGroup: /system.slice/ollama.service
             └─1274 /usr/local/bin/ollama serve

Step 5: Configure Ollama for Intel Arc with Vulkan

Edit the Ollama systemd service configuration:

sudo systemctl edit --full ollama.service

Replace the contents with the following configuration:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
Environment="GGML_VK_VISIBLE_DEVICES=1"
Environment="OLLAMA_VULKAN=1"
Environment="OLLAMA_NEW_ENGINE=1"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"

[Install]
WantedBy=default.target

Understanding the Environment Variables

Let's break down each environment variable and its purpose:

Vulkan Backend: Intel Arc GPUs perform exceptionally well with Vulkan. The experimental Vulkan support is disabled by default, so explicitly enabling it with OLLAMA_VULKAN=1 is crucial for GPU acceleration.

Step 6: Apply Changes and Restart

Reload systemd and restart Ollama:

sudo systemctl daemon-reload
sudo systemctl restart ollama.service
sudo systemctl status ollama.service

Check the logs to verify GPU detection:

sudo journalctl -u ollama.service -n 50

Look for lines indicating Vulkan is enabled and the GPU is detected.

Node 2: NVIDIA Quadro T1000 Setup

The NVIDIA Quadro T1000 with 8GB VRAM is a solid professional GPU perfect for running 7B-13B parameter models. NVIDIA has excellent Linux driver support, making the setup straightforward.

Step 1: System Preparation

Update the system:

sudo apt update
sudo apt upgrade -y

Step 2: Verify GPU Detection

Check if the NVIDIA GPU is detected:

lspci | grep -i vga

You should see output similar to:

01:00.0 VGA compatible controller: NVIDIA Corporation TU117GL [T1000 8GB] (rev a1)

Step 3: Install NVIDIA Drivers

Add the graphics drivers PPA and install the NVIDIA driver:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install -y nvidia-driver-580
Driver Version: Version 580 is the latest at the time of writing. You can check available versions with ubuntu-drivers devices or use sudo ubuntu-drivers autoinstall to automatically install the recommended driver.

Reboot the system to load the new drivers:

sudo reboot

After reboot, verify the driver installation:

nvidia-smi

This should display information about your GPU, including temperature, memory usage, and driver version.

Step 4: Install Ollama

Install Ollama using the official script:

curl -fsSL https://ollama.com/install.sh | sh

Step 5: Configure Ollama for NVIDIA

Edit the Ollama service configuration:

sudo systemctl edit --full ollama.service

Replace the contents with:

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
Environment="OLLAMA_NEW_ENGINE=1"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"

[Install]
WantedBy=default.target

Understanding the Configuration

The NVIDIA configuration is simpler than Intel Arc because NVIDIA GPU support is native to Ollama:

The simplicity of this configuration reflects NVIDIA's mature CUDA support - no special flags needed, it just works.

Step 6: Apply Changes and Restart

Reload and restart the service:

sudo systemctl daemon-reload
sudo systemctl restart ollama.service
sudo systemctl status ollama.service

Monitor GPU usage while running models:

watch -n 1 nvidia-smi

Testing Your Setup

Now that both nodes are configured, let's test them!

Download Models

On Node 1 (Intel Arc - 4GB VRAM), download smaller models:

ollama pull llama3.2:3b
ollama pull nomic-embed-text

On Node 2 (NVIDIA - 8GB VRAM), download larger models:

ollama pull llama3.2:8b
ollama pull mistral:7b

Run Interactive Sessions

Test each node with an interactive session:

# On Node 1
ollama run llama3.2:3b

# On Node 2
ollama run llama3.2:8b

API Access from Other Machines

Since both services are bound to 0.0.0.0:11434, you can access them from other machines:

# From your workstation - access Node 1 (assuming 192.168.1.100)
curl http://192.168.1.100:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

# Access Node 2 (assuming 192.168.1.101)
curl http://192.168.1.101:11434/api/generate -d '{
  "model": "llama3.2:8b",
  "prompt": "Explain quantum computing.",
  "stream": false
}'

Performance Expectations

Based on real-world usage, here's what to expect from each node:

Node 1 - Intel Arc A310 Performance

Node 2 - NVIDIA Quadro T1000 Performance

Monitoring and Maintenance

Systemd Service Management

# Check service status
sudo systemctl status ollama.service

# View recent logs
sudo journalctl -u ollama.service -n 100

# Follow logs in real-time
sudo journalctl -u ollama.service -f

# Restart service
sudo systemctl restart ollama.service

GPU Monitoring

On Node 1 (Intel Arc):

# Install GPU monitoring tool
sudo apt install intel-gpu-tools

# Monitor GPU usage
sudo intel_gpu_top

On Node 2 (NVIDIA):

# Real-time GPU monitoring
watch -n 1 nvidia-smi

# Detailed monitoring
nvidia-smi dmon

Troubleshooting

Intel Arc GPU Not Utilized

If the Intel Arc GPU isn't being used:

# Check Vulkan support
vulkaninfo --summary

# Verify GPU is visible
lspci | grep -i vga

# Check service logs
sudo journalctl -u ollama.service | grep -i vulkan

# Ensure OLLAMA_VULKAN=1 is set
sudo systemctl show ollama.service | grep OLLAMA_VULKAN

NVIDIA GPU Not Detected

If NVIDIA GPU isn't working:

# Check driver status
nvidia-smi

# If command not found, reinstall drivers
sudo apt install --reinstall nvidia-driver-580

# Check if GPU is visible
lspci | grep -i nvidia

# Reboot if needed
sudo reboot

Network Access Issues

If you can't access Ollama from other machines:

# Check if service is listening on correct interface
sudo ss -tlnp | grep 11434

# Test locally first
curl http://localhost:11434/api/tags

# Check firewall
sudo ufw status
sudo ufw allow 11434/tcp

Lessons Learned

Running this dual-node setup has taught me valuable lessons about AI infrastructure:

Conclusion

Building a dual-node Ollama setup demonstrates that running production-grade AI infrastructure doesn't require expensive cloud services or enterprise hardware. Two compact ThinkCentre M720q machines, provide a capable, educational, and private AI platform.

The home lab approach offers unmatched learning opportunities - from driver installation and systemd configuration to GPU optimization and model selection. Every challenge solved builds deeper understanding of how AI systems work at a fundamental level.

Whether you're a student learning AI deployment, a developer building AI-powered applications, or a privacy-conscious user wanting control over your data, self-hosting Ollama provides a practical, cost-effective solution that grows with your needs.