Skip to main content

Inferact

Category: AI Infrastructure

A high-performance AI inference platform designed to commercialize and scale the vLLM engine for enterprise-grade Large Language Model (LLM) deployment. Inferact was founded in 2025. The company is led by Simon Mo. Based in Berkeley, USA. Team size: 10-50. Total funding raised: $150.0M. Latest round: Seed round ($150.0M, Jan 2026). Key investors include ["Andreessen Horowitz (a16z)","Lightspeed Venture Partners","Sequoia Capital","Altimeter Capital","Redpoint Ventures","ZhenFund"].

Founded
2025
Headquarters
Berkeley, USA
Team size
10-50
Total funding
$150.0M

Value proposition

Reduces LLM operational costs and latency by maximizing GPU throughput through advanced memory management and distributed systems optimization.

Products and solutions

["Enterprise vLLM Managed Service","Inferact Inference Optimization Engine","Distributed LLM Orchestration Layer","High-Throughput Model Serving APIs"]

Unique value

Founded by the original creators of vLLM and the pioneers of PagedAttention, the company possesses the deepest technical expertise in the industry's most widely adopted open-source inference engine.

Target customer

Enterprise AI engineering teams, LLM application developers, cloud service providers, and organizations deploying generative AI at scale.

Industries served

["Artificial Intelligence & Machine Learning","Enterprise Software (SaaS)","Cloud Infrastructure","Financial Services","Healthcare & Life Sciences"]

Technology advantage

Utilizes 'PagedAttention' technology, which manages attention key-value caches like virtual memory in operating systems, virtually eliminating memory fragmentation and allowing for 10x-20x higher throughput than standard systems.

How they differentiate

Founded by the original creators of vLLM and PagedAttention, offering the deepest technical optimizations for the industry's most widely adopted open-source inference engine.

Main competitors

["Together AI","Fireworks AI","Anyscale","Groq"]

Key partnerships

["UC Berkeley Sky Computing Lab (Research & Talent Pipeline)","Andreessen Horowitz (a16z)","Anyscale (Ray distributed framework integration)","NVIDIA (Inference hardware optimization)"]

Notable customers

["Enterprise AI engineering teams","Cloud Service Providers","Generative AI Application Developers"]

Major milestones

["Secured $150M Seed funding at an $800M valuation in January 2026","Commercialized the vLLM open-source project for enterprise-grade use","Pioneered PagedAttention technology for high-throughput memory management"]

Growth metrics

Reached an $800M valuation at the Seed stage; underlying vLLM project is the industry standard for LLM inference.

Market positioning

Premium AI infrastructure provider specializing in high-throughput, low-latency LLM serving for enterprise-scale deployments.

Geographic focus

Global (Headquartered in Berkeley, USA)

Patents and IP

Proprietary enterprise-grade optimizations built on top of the open-source vLLM (Apache 2.0) framework; specific commercial patents pending for advanced distributed memory management.

About Simon Mo

Simon Mo is a co-creator and co-lead of the vLLM open-source project, which pioneered PagedAttention for high-throughput LLM inference. He was previously a Software Engineer at Anyscale, where he worked on the Ray distributed computing framework. He is a PhD candidate at the UC Berkeley Sky Computing Lab, specializing in AI infrastructure and distributed systems.

Official website: