Inferact
Category: AI Infrastructure
A high-performance AI inference platform designed to commercialize and scale the vLLM engine for enterprise-grade Large Language Model (LLM) deployment. Inferact was founded in 2025. The company is led by Simon Mo. Based in Berkeley, USA. Team size: 10-50. Total funding raised: $150.0M. Latest round: Seed round ($150.0M, Jan 2026). Key investors include ["Andreessen Horowitz (a16z)","Lightspeed Venture Partners","Sequoia Capital","Altimeter Capital","Redpoint Ventures","ZhenFund"].
- Founded
- 2025
- Headquarters
- Berkeley, USA
- Team size
- 10-50
- Total funding
- $150.0M
Value proposition
Reduces LLM operational costs and latency by maximizing GPU throughput through advanced memory management and distributed systems optimization.
Products and solutions
["Enterprise vLLM Managed Service","Inferact Inference Optimization Engine","Distributed LLM Orchestration Layer","High-Throughput Model Serving APIs"]
Unique value
Founded by the original creators of vLLM and the pioneers of PagedAttention, the company possesses the deepest technical expertise in the industry's most widely adopted open-source inference engine.
Target customer
Enterprise AI engineering teams, LLM application developers, cloud service providers, and organizations deploying generative AI at scale.
Industries served
["Artificial Intelligence & Machine Learning","Enterprise Software (SaaS)","Cloud Infrastructure","Financial Services","Healthcare & Life Sciences"]
Technology advantage
Utilizes 'PagedAttention' technology, which manages attention key-value caches like virtual memory in operating systems, virtually eliminating memory fragmentation and allowing for 10x-20x higher throughput than standard systems.
How they differentiate
Founded by the original creators of vLLM and PagedAttention, offering the deepest technical optimizations for the industry's most widely adopted open-source inference engine.
Main competitors
["Together AI","Fireworks AI","Anyscale","Groq"]
Key partnerships
["UC Berkeley Sky Computing Lab (Research & Talent Pipeline)","Andreessen Horowitz (a16z)","Anyscale (Ray distributed framework integration)","NVIDIA (Inference hardware optimization)"]
Notable customers
["Enterprise AI engineering teams","Cloud Service Providers","Generative AI Application Developers"]
Major milestones
["Secured $150M Seed funding at an $800M valuation in January 2026","Commercialized the vLLM open-source project for enterprise-grade use","Pioneered PagedAttention technology for high-throughput memory management"]
Growth metrics
Reached an $800M valuation at the Seed stage; underlying vLLM project is the industry standard for LLM inference.
Market positioning
Premium AI infrastructure provider specializing in high-throughput, low-latency LLM serving for enterprise-scale deployments.
Geographic focus
Global (Headquartered in Berkeley, USA)
Patents and IP
Proprietary enterprise-grade optimizations built on top of the open-source vLLM (Apache 2.0) framework; specific commercial patents pending for advanced distributed memory management.
About Simon Mo
Simon Mo is a co-creator and co-lead of the vLLM open-source project, which pioneered PagedAttention for high-throughput LLM inference. He was previously a Software Engineer at Anyscale, where he worked on the Ray distributed computing framework. He is a PhD candidate at the UC Berkeley Sky Computing Lab, specializing in AI infrastructure and distributed systems.
Official website: https://inferact.ai