Not sure about ollama, but llama-server does have a transparent kv cache.
You can run it with
llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none
Not sure about ollama, but llama-server does have a transparent kv cache.
You can run it with
Web UI at http://localhost:8080 (also OpenAI compatible API)