Skip to main content

DeepInfra

Category: AI Infrastructure

Purpose-built cloud inference platform for high-throughput AI, enabling companies to run open-source and proprietary AI models at scale via API. DeepInfra was founded in 2022. The company is led by Nikola Borisov. Based in Palo Alto, California. Team size: 11-50. Total funding raised: $135.6M. Latest round: Series B. Key investors include 500 Global; Georges Harik; A.Capital Ventures; Crescent Cove; Felicis; NVIDIA; Peak6; Samsung Next; Supermicro; Upper90.

Founded
2022
Headquarters
Palo Alto, California
Team size
11-50
Total funding
$135.6M

Value proposition

DeepInfra provides a vertically integrated, inference-optimized cloud platform that owns the full stack from GPU hardware to API layer, delivering low-cost, low-latency, high-throughput AI inference without long-term contracts or vendor lock-in.

Products and solutions

DeepInfra API (190+ open-source models via OpenAI-compatible API); DeepStart (startup program); DeepCluster (dedicated GPU clusters); On-demand DGX B300 GPU rentals; Model hosting for text generation, text-to-image, text-to-speech, text-to-video, embeddings, reranker, ASR, zero-shot image classification

Unique value

Lowest cost-per-token among major inference providers; vertically integrated stack (owns GPU hardware across 8 US data centers); zero data retention; SOC 2 and ISO 27001 certified; 190+ open-source models; purpose-built for agentic-era continuous token generation

Target customer

Developers, startups, scaleups, and enterprises deploying production AI workloads — particularly agentic and high-throughput inference applications.

Industries served

AI/ML infrastructure; Enterprise AI; Agentic AI; Developer tools; SaaS

Technology advantage

Full-stack vertical integration from GPU hardware to API; early deployment of NVIDIA Blackwell GPUs with upcoming Vera Rubin; early collaborator in NVIDIA's open AI ecosystem (Nemotron, NemoClaw, Dynamo); inference-optimized infrastructure vs general-purpose cloud; team built imo messenger infrastructure serving 200M+ MAU

How they differentiate

DeepInfra differentiates on cost leadership (lowest cost-per-token), full vertical integration (owning hardware through API), purpose-built inference-only infrastructure (vs general-purpose cloud), zero data retention privacy policy, and SOC 2/ISO 27001 enterprise compliance — while competitors like Together AI emphasize research (FlashAttention) and Fireworks AI focuses on speed/reliability with ex-PyTorch engineering talent.

Main competitors

Together AI; Fireworks AI; Groq

Key partnerships

NVIDIA (early infrastructure collaborator, investor, Nemotron/NemoClaw/Dynamo ecosystem); 500 Global (lead investor); Supermicro (investor); Samsung Next (investor); Vercel (integration partner)

Notable customers

Venice AI

Major milestones

Founded September 2022; $8M Seed round (Nov 2023); $20.6M Series A led by Felicis (Dec 2024); $107M Series B co-led by 500 Global and Georges Harik (May 2026); SOC 2 and ISO 27001 certified; Early NVIDIA Blackwell GPU deployment; 190+ open-source models on platform; Revenue tripled since beginning of 2026

Growth metrics

~5 trillion tokens processed per week; 25x token volume growth since Series A; 8,000x processing volume growth since seed stage; Revenue tripled since beginning of 2026; 8 US data centers

Market positioning

Cost leader in the AI inference API market, positioned as the most efficient and affordable option for high-throughput open-source model inference, competing against hyperscalers and specialized inference clouds.

Geographic focus

US (8 data centers), with planned global expansion

About Nikola Borisov

Ex-Engineering Director at imo.im; Founding member of HalloApp; Backend Software Engineer at Microsoft. Northwestern University.

Official website: