Tensorust | Notion

High-performance LLM infrastructure — built entirely in Rust.

About Us

Tensorust is building the next generation of AI infrastructure, fast, safe, and uncompromising, using the power of Rust.

Where others glue together Python and C++, we build from scratch. Every layer of the stack, authored in systems-grade Rust. Why? Because the future of AI requires speed, reliability, and control at the metal.

We're not here to optimize someone else's mess. We're here to redefine the stack.

What We’re Building

LLM Inference Engine — Rust-native, GPU-accelerated, memory-tight, and 10x faster than Python-based runtimes
Tokenizer + Loader Framework — Blazing fast tokenization and model loading without Python overhead
Distributed Model Serving Platform — Horizontal scaling, autosplit sharding, and zero-latency cold starts
Tensor Compiler — In-development project to compile symbolic computation graphs directly to Rust for inference + training

Core Projects (Built by Us)

These aren’t side repos — they’re Tensorust’s foundational layers.

Project	Description	Link
`everyother-token`	A custom Rust tokenizer engine optimized for transformer-based models	‣
`rust-perf-bench`	Performance benchmarking suite for LLM layers written in pure Rust	[Private/in development]
`gpt-json-parser`	Streaming JSON parser for LLM-generated content in Rust	[Private/in development]
`fastllama-rs` (upcoming)	Lightweight llama2/llama3 inference engine in Rust with Metal + CUDA backends	[Private/in development]

Our Philosophy

Build Deep — We don’t wrap APIs. We author them.