An on-device, full-pipeline voice AI CLI for macOS Apple Silicon, integrating STT, LLM, TTS, VLM and local RAG with zero cloud dependency and <200ms end-to-end latency.
RCLI is an on-device voice AI CLI developed by RunAnywhere, Inc. (Y Combinator backed), designed exclusively for macOS Apple Silicon. Its core strength lies in unifying voice activity detection (Silero VAD), streaming/offline STT (Zipformer, Whisper, Parakeet), small-parameter LLM inference (Qwen3, LFM2, etc. with KV Cache continuation and Flash Attention), double-buffered TTS, and vision-language models (Qwen3 VL, SmolVLM, etc.) into a locally-executed pipeline with sub-200ms end-to-end latency.
The project adopts a dual-engine inference architecture: the proprietary MetalRT engine achieves up to 668 tok/s LLM decode (M4 Max) and 101ms STT latency through hand-written Metal Shading Language kernels, operator fusion, and unified memory optimization (requiring M3+ with Metal 3.1); M1/M2 chips automatically fall back to llama.cpp. VLM currently runs on llama.cpp, with MetalRT VLM support coming soon.
Beyond voice conversation, RCLI provides local RAG document Q&A (hybrid vector + BM25 retrieval, ~4ms at 5K+ chunks), 40 pre-built macOS voice actions (executed via AppleScript/Shell), and an interactive TUI with Push-to-Talk, hardware monitoring, and model management. All inference runs locally with zero data upload, suitable for privacy-sensitive scenarios. The CLI itself is MIT-licensed; MetalRT is proprietary. Latest version: v0.3.7.
Installation
curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
Or via Homebrew:
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup
Quick Start
rcli # Interactive TUI
rcli listen # Continuous voice mode
rcli ask "open Safari" # Single command
rcli vlm photo.jpg "what's in this image?"
rcli camera # Camera VLM
rcli screen # Screen capture VLM
rcli rag ingest ~/Documents/notes
Default Model Set (~1GB): LFM2 1.2B (LLM) + Whisper base.en (STT) + Piper (TTS) + Silero VAD + Snowflake Embeddings
Key Commands: rcli models (model management), rcli voices (TTS voice switching), rcli metalrt install/status (MetalRT engine management), rcli llamacpp (llama.cpp engine management), rcli actions (view 40 macOS actions), rcli cleanup (clean unused models).