RadixArk
Category: AI Infrastructure
A high-performance AI inference platform and programming framework designed to accelerate and optimize the deployment of large language models (LLMs) at scale. RadixArk was founded in 2024. The company is led by Ying Sheng. Based in Berkeley, USA. Team size: 1-10. Total funding raised: $400M valuation (Accel-led round). Latest round: Series A ($400M valuation, Jan 2026). Key investors include ["Accel","Lip-Bu Tan"].
- Founded
- 2024
- Headquarters
- Berkeley, USA
- Team size
- 1-10
- Total funding
- $400M valuation (Accel-led round)
Value proposition
Drastically reduces inference latency and operational costs by optimizing KV cache management and enabling structured, multi-step LLM workflows through a specialized programming interface.
Products and solutions
["SGLang (Structured Generation Language) Open-Source Core","RadixArk Enterprise Inference Engine","Managed Inference Optimization Services","Structured Output API (JSON/Regex constrained generation)"]
Unique value
Commercializes SGLang, a framework that allows for 'structured generation' where the model's output is controlled and accelerated by a specialized compiler and runtime, rather than just raw token generation.
Target customer
AI labs, enterprise software companies, cloud service providers, and developers building high-throughput LLM applications.
Industries served
["Artificial Intelligence Infrastructure","Cloud Computing","Enterprise Software (SaaS)","Software Development Tools"]
Technology advantage
Features 'RadixAttention,' a novel technique for automatic KV cache sharing across multiple requests (prefix caching), and a high-performance runtime that outperforms existing solutions like vLLM in complex, multi-turn interactions.
How they differentiate
RadixArk differentiates through 'RadixAttention,' a novel prefix caching technique that allows for automatic KV cache sharing across multiple requests. Unlike vLLM's block-level hashing, RadixArk's SGLang framework uses a token-level radix tree, significantly reducing latency and compute costs for multi-turn conversations and complex, structured LLM workflows.
Main competitors
["vLLM / Inferact","Anyscale","NVIDIA (TensorRT-LLM)","Together AI"]
Key partnerships
["UC Berkeley SkyLab (Academic origin)","LMSYS Org (Co-founding relationship)","Accel (Lead Investor)","Early adopters including xAI (Grok) and Cursor (Anysphere)"]
Notable customers
["xAI (Grok)","Cursor (Anysphere)"]
Major milestones
["Open-source launch of SGLang at UC Berkeley SkyLab (2024)","Commercial spin-out from UC Berkeley as RadixArk (January 2025)","Secured funding at a $400M valuation led by Accel (January 2025)","Integration as the core inference engine for xAI's Grok models"]
Growth metrics
SGLang has seen rapid adoption within the AI developer community over the last six months, becoming a primary alternative to vLLM for high-performance inference.
Market positioning
High-performance AI inference infrastructure provider targeting enterprise AI labs and high-throughput LLM application developers.
Geographic focus
North America (Berkeley/San Francisco based), with a global developer community.
Patents and IP
No specific registered patents disclosed; intellectual property is centered on proprietary optimizations of the SGLang architecture and trade secrets in distributed inference.
About Ying Sheng
Ying Sheng is the co-founder and CEO of RadixArk. She was previously a software engineer at xAI, where she worked on inference systems for Grok, and a research scientist at Databricks. She is a co-founder of LMSYS Org and a primary contributor to the SGLang project. She holds a PhD in Computer Science from Stanford University, where her research focused on high-throughput generative inference.
Official website: https://www.radixark.ai/