arle(1)

arle

A runtime-first Rust workspace. infer serves OpenAI-compatible traffic on CUDA, Metal, and CPU; arle is the unified front door for run, serve, train, and data flows.

cuda stable · ampere+ metal beta · apple silicon cpu dev only api openai · v1 release v0.1.4 · 2026-04-28
arle — bash ~/projects/arle
$ arle --doctor
cuda    ok    # nvidia-smi · cuda 12.x · ampere+
metal   beta  # apple m-series detected
cpu     ok    # dev-only smoke path
model   ok    # Qwen3-4B reachable
api     ok    # /v1/chat/completions · streaming

$ arle serve --backend cuda --model Qwen3-4B
listening on http://0.0.0.0:8000  · ready in 1.4s

Install

One runnable line per platform. Pre-built tarballs and SHAs on each GitHub Release; the curl installer verifies SHA256 before extracting.

Apple Silicon · Homebrew zsh / bash
$ brew install cklxx/tap/arle
$ arle --doctor
Linux x86_64 / macOS · curl sh-compatible
$ curl -fsSL https://github.com/cklxx/arle/releases/latest/download/install.sh \
    | sh
$ arle --doctor
CUDA · GPU container docker / nvidia
$ docker run --rm --gpus all -p 8000:8000 \
    -v /path/to/Qwen3-4B:/model:ro \
    ghcr.io/cklxx/arle:latest \
    serve --backend cuda --model-path /model
Source · Cargo workspace
$ git clone https://github.com/cklxx/arle && cd arle
$ cargo install --path crates/cli --features cuda
# --features cuda is opt-in; cpu builds out of the box

Bench

Dated, reproducible snapshots straight from docs/experience/wins/. Numbers come out of scripts/bench_guidellm.sh and the canonical step-driver smokes — nothing is curated.

2026-04-28 stable · ci-gated

cuda · NVIDIA L4 · Qwen3-4B · BF16 + FP8 paged KV (auto) · c=16

output
197tok/s
itl p50
77.9ms
vs legacy
+64%
kv util
69%
scripts/bench_guidellm.sh cuda-l4-hbm-tier-fp8-auto snapshot ↗
2026-04-27 beta · validated

metal · Apple M4 Pro · Qwen3.5-0.8B Q4_K_M · GGUF decode

gen
211tok/s
e2e
202tok/s
decode
4.7ms/tok
ttft
223ms
metal_bench --model Qwen3.5-0.8B-Q4_K_M.gguf snapshot ↗

Support matrix

Three backends, one runtime contract. Authoritative truth lives in docs/support-matrix.md.

backendstabilityos / hardwaremodelsquantsapi
cudastableLinux + NVIDIA Ampere+Qwen3 / Qwen3.5FP16 / BF16, GGUF Q4_KOpenAI v1
metalbetaApple Silicon (M1+)Qwen3 / Qwen3.5FP16 / BF16, dense GGUFOpenAI v1
cpudev onlyportable smokeQwen3 / Qwen3.5 (small)FP16 / BF16OpenAI v1

Files

The repo at a glance. Everything links back to canonical paths in cklxx/arle.