Protege
Category: AI Infrastructure
A platform for AI training data that enables ethical sourcing of hard-to-find, multimodal, and real-world AI training data at scale, connecting data holders with AI developers through a governed marketplace. Protege was founded in 2024. The company is led by Bobby Samuels. Based in New York City, United States. Team size: 10-50. Total funding raised: $65.0M. Latest round: Series A Extension. Key investors include ["Andreessen Horowitz (a16z)","Footwork","CRV","Bloomberg Beta","Flex Capital","Shaper Capital","Liquid 2 Ventures","SV Angel"].
- Founded
- 2024
- Headquarters
- New York City, United States
- Team size
- 10-50
- Total funding
- $65.0M
Value proposition
Accelerates AI development by providing seamless, compliant access to curated real-world datasets while handling licensing, compliance, and technical preparation - reducing months or years of negotiation and integration work into a streamlined process
Products and solutions
["AI Training Data Platform","Healthcare Data Products","Media/Video Data Catalog","Audio and Speech Data","Motion Capture Data","Spatial & Physical Intelligence Data","Evaluation Datasets & Benchmarks","Data Lab"]
Unique value
First centralized platform to create a governed marketplace specifically for AI training data. Provides end-to-end solution including data aggregation, curation, licensing, compliance controls, and technical preparation to make datasets AI-ready. Source-centric approach that empowers data holders while ensuring ethical sourcing.
Target customer
AI foundational model companies (including majority of 'Magnificent Seven'), AI application developers, enterprise AI teams, healthcare AI companies, and generative AI companies requiring proprietary training data
Industries served
["Healthcare AI","Generative AI","Media and Entertainment","Motion Capture and Animation","Spatial & Physical Intelligence / Robotics","Enterprise AI","Foundation Model Development","Speech Recognition","Computer Vision"]
Technology advantage
Combines three critical capabilities: (1) Curated marketplace with 100+ active data providers across four verticals, (2) Automated compliance and governance layer with HIPAA-compliant data transformation and privacy controls, (3) Technical expertise to clean, classify, and structure data for immediate AI training use. This integrated approach eliminates the traditional 6-24 month negotiation and integration cycle for accessing proprietary data.
How they differentiate
First centralized governed marketplace for AI training data with direct partnerships to data holders, emphasizing ethically-sourced, private, multimodal real-world data. Source-centric approach empowering data holders with transparency and hands-on curation, unlike crowd-sourced annotation platforms.
Main competitors
["Scale AI","Labelbox","SuperAnnotate"]
Key partnerships
["Calliope Networks (acquired December 2024)","Syndesis Health","Amazon Web Services (AWS)","Veritas Data Research","Sidus Insights"]
Notable customers
["Majority of MAG7 (Magnificent Seven) tech companies","Leading foundation model companies","Enterprise AI teams"]
Major milestones
["Raised $10M Seed round led by CRV in September 2024","Acquired Calliope Networks (December 2024)","Raised $25M Series A led by Footwork in August 2025","Achieved 20x business growth in 2025 with $30M GMV","Raised $30M Series A extension led by Andreessen Horowitz in January 2026","Partnered with 100+ active data providers globally"]
Growth metrics
20x business growth in 2025, $30M GMV in first full year of business, 100+ active data providers across healthcare, media, audio/speech, and motion capture verticals
Market positioning
Premium AI training data marketplace serving foundation model companies and enterprise AI developers with focus on hard-to-find, proprietary datasets across healthcare, media, audio/speech, and motion capture verticals
Geographic focus
United States, serving global markets including majority of MAG7 tech companies
Patents and IP
No registered patents disclosed as of latest update
About Bobby Samuels
Former General Manager of Privacy Hub at Datavant (2020-2023), where he helped drive the company's growth leading to its $7.0B merger with Ciox Health. Previously held leadership roles including Head of Talent Acquisition at Datavant. Deep expertise in regulated data exchange, healthcare data ecosystems, and privacy-preserving technologies.
Official website: https://www.withprotege.ai