Run with Kubernetes
For installing LocalAI in Kubernetes, the deployment file from the examples can be used and customized as preferred:
For Nvidia GPUs:
Alternatively, the helm chart can be used as well:
Security Context Requirements
LocalAI spawns child processes to run model backends (e.g., llama.cpp, diffusers, whisper). To properly stop these processes and free resources like VRAM, LocalAI needs permission to send signals to its child processes.
If you’re using restrictive security contexts, ensure the CAP_KILL capability is available:
Without the KILL capability, LocalAI cannot terminate backend processes when models are stopped, leading to:
- VRAM and memory not being freed
- Orphaned backend processes holding GPU resources
- Error messages like
error while deleting process error=permission denied
Troubleshooting
Issue: VRAM is not freed when stopping models
Symptoms:
- Models appear to stop but GPU memory remains allocated
- Logs show
(deleteProcess) error while deleting process error=permission denied - Backend processes remain running after model unload
Common Causes:
- All capabilities are dropped without adding back
CAP_KILL - Using user namespacing (
hostUsers: false) with certain configurations - Overly restrictive seccomp profiles that block signal-related syscalls
- Pod Security Policies or Pod Security Standards blocking required capabilities
Solution:
Add the
KILLcapability to your container’s security context as shown in the example above.If you’re using a Helm chart, configure the security context in your
values.yaml:
- Verify the capability is present in the running pod:
If running in privileged mode works but the above doesn’t, check your cluster’s Pod Security Policies or Pod Security Standards. You may need to adjust cluster-level policies to allow the
KILLcapability.Ensure your seccomp profile (if custom) allows the
killsyscall. TheRuntimeDefaultprofile typically includes this.