≡

(AB)ᵢⱼ = Σₖ AᵢₖBₖⱼ

tr(AB) = tr(BA)

∇L = ∂L/∂w

⊙

A = U Σ Vᵀ

rank(A) = #{σᵢ > 0}

‖A‖_F = √Σ σᵢ²

|A ∪ B| = |A| + |B| − |A ∩ B|

C(n,k) = n! / (k!(n−k)!)

↻ iterate

wₜ₊₁ = wₜ − η ∇L(wₜ)

L = − Σ y log ŷ

∂

P(A|B) = P(B|A) P(A) / P(B)

E[X] = Σ x P(x)

A = Q Λ Qᵀ

det(AB) = det A · det B

∀ε > 0, ∃δ

⟳

|⟨x,y⟩| ≤ ‖x‖·‖y‖

‖x+y‖ ≤ ‖x‖ + ‖y‖

∇

ai infrastructure

The stack we build to train on.

Self-hosting math-native compiler, derivation-first OS, open-source agent runtimes, validated GPU training stack — built bottom-up, exercised in parallel across Blackwell, Hopper, and ARM-DGX.

⟳

self-host

MAX

Math-native language and self-hosting compiler. Unicode operators, triple-bootstrap fixed point, CUDA / HIP / Metal back-ends. Used in-lab to ship real neural workloads.

prototype

KeiSei OS

Operating system written in MAX with page-replace and scheduling policies derived from lab research. Boots in QEMU with userspace, ext2-rw, and EL0 loader.

⊙

open source

KeiSeiKit

Agent runtime — published Rust crates, agent manifests, hooks, skills. Open foundation for Claude Code workflows.

⊠

open source

KeiSeiSDK

Standalone vertical-stack agent runtime + CLI for self-hosted LLM deployments.

∇

fleet-validated

GPU LLM

From-scratch GPU forward + full GPU backward for Qwen3-32B. Parity-validated against CPU reference (rel-L2 < 5e-6). Exercised on B300, H200, H100, and 2× GB10 ARM-DGX.

Workshop — hardware actually exercised

■

B300

Blackwell SXM6 — heaviest tier

■

H200

Hopper 141 GB — single-GPU primary

■

H100

Hopper 80 GB — multi-GPU DDP

■

GB10

ARM-DGX local #1 — gx10

■

GB10

ARM-DGX local #2

Cloud and local in parallel. The same kernels run across Blackwell, Hopper, and ARM-DGX — no vendor lock. Every launch goes through a pre-flight memory + cost gate before the meter starts. No fire-and-forget pods.