Running large language models (LLMs) locally in a home lab environment offers unique advantages beyond just cost savings. It provides an invaluable learning opportunity to understand AI infrastructure from the ground up, experiment freely without API rate limits, maintain complete privacy over your data, and gain deep technical knowledge about model deployment, resource optimization, and system architecture.
In this guide, I'll walk you through my home lab setup for running Ollama across two Lenovo ThinkCentre M720q machines, each with different GPU configurations. This dual-node setup demonstrates how commodity hardware can power a practical, educational AI infrastructure.
Why Run LLMs Locally?
Before diving into the technical details, let's explore the benefits of self-hosting:
- Learning and Education: Hands-on experience with model deployment, GPU utilization, system optimization, and troubleshooting builds deep technical understanding
- Privacy and Control: Your data never leaves your infrastructure - no third-party services, no data retention policies to worry about
- Cost Efficiency: After initial hardware investment, no per-token costs or monthly subscriptions
- Experimentation Freedom: Test different models, configurations, and use cases without worrying about API costs
- Network Independence: Once models are downloaded, you can operate without internet connectivity
- Home Lab Integration: Integrate AI capabilities into your existing home automation, monitoring, and development workflows
Hardware Setup
My setup consists of two Lenovo ThinkCentre M720q tiny desktops - compact, efficient machines that pack surprising power:
Node 1 - Intel Arc Setup
- Model: Lenovo ThinkCentre M720q
- CPU: Intel i5-8600T (6 cores / 6 threads)
- RAM: 64GB DDR4
- GPU: Intel Arc A310 (4GB VRAM)
- OS: Ubuntu 24.04 LTS
- Purpose: Embedding generation and smaller models
Node 2 - NVIDIA Setup
- Model: Lenovo ThinkCentre M720q
- CPU: Intel i5-8600T (6 cores / 6 threads)
- RAM: 32GB DDR4
- GPU: NVIDIA Quadro T1000 (8GB VRAM)
- OS: Ubuntu 24.04 LTS
- Purpose: Text generation and larger models
Node 1: Intel Arc A310 Setup
The Intel Arc A310 is an excellent budget GPU for AI workloads. While it has limited VRAM (4GB), it's perfect for embedding models and smaller LLMs. Ollama uses Vulkan backend for Intel Arc GPUs, which provides great performance.
Step 1: System Preparation
Start with a fully updated system:
sudo apt update
sudo apt upgrade -y
Step 2: Verify GPU Detection
Check if the Intel Arc GPU is detected:
lspci | grep -i vga
You should see output similar to:
03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A310] (rev 05)
Step 3: Install Intel GPU Drivers
Install the Intel GPU drivers and required dependencies:
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
sudo apt update
# Install Intel GPU drivers and OpenCL support
sudo apt install -y \
libze-intel-gpu1 \
libze1 \
intel-metrics-discovery \
intel-opencl-icd \
clinfo \
intel-gsc
# Install media acceleration drivers
sudo apt install -y \
intel-media-va-driver-non-free \
libmfx-gen1 \
libvpl2 \
libvpl-tools \
libva-glx2 \
va-driver-all \
vainfo
# Install development libraries
sudo apt install -y \
libze-dev \
intel-ocloc \
libze-intel-gpu-raytracing
kobuk-team/intel-graphics PPA provides up-to-date Intel GPU drivers optimized for Ubuntu. These drivers include support for Intel Arc discrete GPUs.
Step 4: Install Ollama
Install Ollama using the official installation script:
curl -fsSL https://ollama.com/install.sh | sh
Verify the installation:
sudo systemctl status ollama.service
You should see output indicating the service is running:
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-12-31 09:30:48 UTC; 41s ago
Main PID: 1274 (ollama)
Tasks: 9 (limit: 76930)
Memory: 9.8M (peak: 21.1M)
CPU: 67ms
CGroup: /system.slice/ollama.service
└─1274 /usr/local/bin/ollama serve
Step 5: Configure Ollama for Intel Arc with Vulkan
Edit the Ollama systemd service configuration:
sudo systemctl edit --full ollama.service
Replace the contents with the following configuration:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
Environment="GGML_VK_VISIBLE_DEVICES=1"
Environment="OLLAMA_VULKAN=1"
Environment="OLLAMA_NEW_ENGINE=1"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"
[Install]
WantedBy=default.target
Understanding the Environment Variables
Let's break down each environment variable and its purpose:
- PATH: Standard system PATH ensuring all required binaries are accessible
- GGML_VK_VISIBLE_DEVICES=1: Specifies which Vulkan device to use (device index 1). This tells the GGML library (used by Ollama) to use the Intel Arc GPU via Vulkan
- OLLAMA_VULKAN=1: Enables Vulkan backend for GPU acceleration. Essential for Intel Arc GPUs as they work best with Vulkan
- OLLAMA_NEW_ENGINE=1: Enables the new inference engine with improved performance and features
- OLLAMA_DEBUG=1: Enables debug logging for troubleshooting and monitoring GPU utilization
- OLLAMA_HOST=0.0.0.0:11434: Makes Ollama accessible from other machines on the network (binds to all interfaces). Change to
127.0.0.1:11434if you only want local access
OLLAMA_VULKAN=1 is crucial for GPU acceleration.
Step 6: Apply Changes and Restart
Reload systemd and restart Ollama:
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
sudo systemctl status ollama.service
Check the logs to verify GPU detection:
sudo journalctl -u ollama.service -n 50
Look for lines indicating Vulkan is enabled and the GPU is detected.
Node 2: NVIDIA Quadro T1000 Setup
The NVIDIA Quadro T1000 with 8GB VRAM is a solid professional GPU perfect for running 7B-13B parameter models. NVIDIA has excellent Linux driver support, making the setup straightforward.
Step 1: System Preparation
Update the system:
sudo apt update
sudo apt upgrade -y
Step 2: Verify GPU Detection
Check if the NVIDIA GPU is detected:
lspci | grep -i vga
You should see output similar to:
01:00.0 VGA compatible controller: NVIDIA Corporation TU117GL [T1000 8GB] (rev a1)
Step 3: Install NVIDIA Drivers
Add the graphics drivers PPA and install the NVIDIA driver:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install -y nvidia-driver-580
ubuntu-drivers devices or use sudo ubuntu-drivers autoinstall to automatically install the recommended driver.
Reboot the system to load the new drivers:
sudo reboot
After reboot, verify the driver installation:
nvidia-smi
This should display information about your GPU, including temperature, memory usage, and driver version.
Step 4: Install Ollama
Install Ollama using the official script:
curl -fsSL https://ollama.com/install.sh | sh
Step 5: Configure Ollama for NVIDIA
Edit the Ollama service configuration:
sudo systemctl edit --full ollama.service
Replace the contents with:
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
Environment="OLLAMA_NEW_ENGINE=1"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"
[Install]
WantedBy=default.target
Understanding the Configuration
The NVIDIA configuration is simpler than Intel Arc because NVIDIA GPU support is native to Ollama:
- No GPU-specific variables needed: Ollama automatically detects NVIDIA GPUs and uses CUDA for acceleration
- OLLAMA_NEW_ENGINE=1: Enables the improved inference engine
- OLLAMA_DEBUG=1: Enables debug logging for monitoring
- OLLAMA_HOST=0.0.0.0:11434: Exposes Ollama on all network interfaces for remote access
The simplicity of this configuration reflects NVIDIA's mature CUDA support - no special flags needed, it just works.
Step 6: Apply Changes and Restart
Reload and restart the service:
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
sudo systemctl status ollama.service
Monitor GPU usage while running models:
watch -n 1 nvidia-smi
Testing Your Setup
Now that both nodes are configured, let's test them!
Download Models
On Node 1 (Intel Arc - 4GB VRAM), download smaller models:
ollama pull llama3.2:3b
ollama pull nomic-embed-text
On Node 2 (NVIDIA - 8GB VRAM), download larger models:
ollama pull llama3.2:8b
ollama pull mistral:7b
Run Interactive Sessions
Test each node with an interactive session:
# On Node 1
ollama run llama3.2:3b
# On Node 2
ollama run llama3.2:8b
API Access from Other Machines
Since both services are bound to 0.0.0.0:11434, you can access them from other machines:
# From your workstation - access Node 1 (assuming 192.168.1.100)
curl http://192.168.1.100:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "Why is the sky blue?",
"stream": false
}'
# Access Node 2 (assuming 192.168.1.101)
curl http://192.168.1.101:11434/api/generate -d '{
"model": "llama3.2:8b",
"prompt": "Explain quantum computing.",
"stream": false
}'
Performance Expectations
Based on real-world usage, here's what to expect from each node:
Node 1 - Intel Arc A310 Performance
- Llama 3.2 3B: 40-50 tokens/second
- Phi 3 3.8B: 45-55 tokens/second
- Embedding models: 200-300 embeddings/second
- Best for: Embedding generation, smaller models, high-throughput tasks
Node 2 - NVIDIA Quadro T1000 Performance
- Llama 3.2 8B: 30-35 tokens/second
- Mistral 7B: 35-40 tokens/second
- Llama 3.2 13B (Q4): 15-20 tokens/second
- Best for: Text generation, conversational AI, larger models
Monitoring and Maintenance
Systemd Service Management
# Check service status
sudo systemctl status ollama.service
# View recent logs
sudo journalctl -u ollama.service -n 100
# Follow logs in real-time
sudo journalctl -u ollama.service -f
# Restart service
sudo systemctl restart ollama.service
GPU Monitoring
On Node 1 (Intel Arc):
# Install GPU monitoring tool
sudo apt install intel-gpu-tools
# Monitor GPU usage
sudo intel_gpu_top
On Node 2 (NVIDIA):
# Real-time GPU monitoring
watch -n 1 nvidia-smi
# Detailed monitoring
nvidia-smi dmon
Troubleshooting
Intel Arc GPU Not Utilized
If the Intel Arc GPU isn't being used:
# Check Vulkan support
vulkaninfo --summary
# Verify GPU is visible
lspci | grep -i vga
# Check service logs
sudo journalctl -u ollama.service | grep -i vulkan
# Ensure OLLAMA_VULKAN=1 is set
sudo systemctl show ollama.service | grep OLLAMA_VULKAN
NVIDIA GPU Not Detected
If NVIDIA GPU isn't working:
# Check driver status
nvidia-smi
# If command not found, reinstall drivers
sudo apt install --reinstall nvidia-driver-580
# Check if GPU is visible
lspci | grep -i nvidia
# Reboot if needed
sudo reboot
Network Access Issues
If you can't access Ollama from other machines:
# Check if service is listening on correct interface
sudo ss -tlnp | grep 11434
# Test locally first
curl http://localhost:11434/api/tags
# Check firewall
sudo ufw status
sudo ufw allow 11434/tcp
Lessons Learned
Running this dual-node setup has taught me valuable lessons about AI infrastructure:
- GPU Selection Matters: The 4GB vs 8GB VRAM difference significantly impacts model choice and performance
- Vulkan vs CUDA: Intel Arc requires explicit Vulkan configuration, while NVIDIA "just works" with CUDA
- Debug Logging is Essential:
OLLAMA_DEBUG=1provides crucial insights into GPU utilization and model loading - Network Flexibility: Exposing services on
0.0.0.0enables flexible deployment patterns and remote access - Resource Allocation: 64GB RAM on Node 1 allows for larger embedding batches, while 32GB on Node 2 is sufficient for generation
- Home Lab Integration: These nodes integrate seamlessly with other home lab services via simple HTTP APIs
Conclusion
Building a dual-node Ollama setup demonstrates that running production-grade AI infrastructure doesn't require expensive cloud services or enterprise hardware. Two compact ThinkCentre M720q machines, provide a capable, educational, and private AI platform.
The home lab approach offers unmatched learning opportunities - from driver installation and systemd configuration to GPU optimization and model selection. Every challenge solved builds deeper understanding of how AI systems work at a fundamental level.
Whether you're a student learning AI deployment, a developer building AI-powered applications, or a privacy-conscious user wanting control over your data, self-hosting Ollama provides a practical, cost-effective solution that grows with your needs.