Skip to main content

Inferact

Category: AI Infrastructure

A high-performance AI inference platform designed to commercialize and scale the vLLM engine for enterprise-grade Large Language Model (LLM) deployment. Inferact was founded in 2025. The company is led by Simon Mo. Based in Berkeley, USA. Team size: 10-50. Total funding raised: $150.0M. Latest round: Seed round ($150.0M, Jan 2026). Key investors include Andreessen Horowitz (a16z), Lightspeed Venture Partners, Sequoia Capital, Altimeter Capital, Redpoint Ventures, ZhenFund.

Founded
2025
Headquarters
Berkeley, USA
Team size
10-50
Total funding
$150.0M

Value proposition

Reduces LLM operational costs and latency by maximizing GPU throughput through advanced memory management and distributed systems optimization.

Products and solutions

Enterprise vLLM Managed Service, Inferact Inference Optimization Engine, Distributed LLM Orchestration Layer, High-Throughput Model Serving APIs

Unique value

Founded by the original creators of vLLM and the pioneers of PagedAttention, the company possesses the deepest technical expertise in the industry's most widely adopted open-source inference engine.

Target customer

Enterprise AI engineering teams, LLM application developers, cloud service providers, and organizations deploying generative AI at scale.

Industries served

Artificial Intelligence & Machine Learning, Enterprise Software (SaaS), Cloud Infrastructure, Financial Services, Healthcare & Life Sciences

Technology advantage

Utilizes 'PagedAttention' technology, which manages attention key-value caches like virtual memory in operating systems, virtually eliminating memory fragmentation and allowing for 10x-20x higher throughput than standard systems.

How they differentiate

Founded by the original creators of vLLM and PagedAttention, offering the deepest technical optimizations for the industry's most widely adopted open-source inference engine.

Main competitors

Together AI, Fireworks AI, Anyscale, Groq

Key partnerships

UC Berkeley Sky Computing Lab (Research & Talent Pipeline), Andreessen Horowitz (a16z), Anyscale (Ray distributed framework integration), NVIDIA (Inference hardware optimization)

Notable customers

Enterprise AI engineering teams, Cloud Service Providers, Generative AI Application Developers

Major milestones

Secured $150M Seed funding at an $800M valuation in January 2026, Commercialized the vLLM open-source project for enterprise-grade use, Pioneered PagedAttention technology for high-throughput memory management

Growth metrics

Reached an $800M valuation at the Seed stage; underlying vLLM project is the industry standard for LLM inference.

Market positioning

Premium AI infrastructure provider specializing in high-throughput, low-latency LLM serving for enterprise-scale deployments.

Geographic focus

Global (Headquartered in Berkeley, USA)

Patents and IP

Proprietary enterprise-grade optimizations built on top of the open-source vLLM (Apache 2.0) framework; specific commercial patents pending for advanced distributed memory management.

About Simon Mo

Simon Mo is a co-creator and co-lead of the vLLM open-source project, which pioneered PagedAttention for high-throughput LLM inference. He was previously a Software Engineer at Anyscale, where he worked on the Ray distributed computing framework. He is a PhD candidate at the UC Berkeley Sky Computing Lab, specializing in AI infrastructure and distributed systems.

Official website: