
OpenAI and Broadcom debut custom AI inference chip 'Jalapeno' in nine-month design cycle
The AMW Read
The custom chip is a new product from OpenAI, a known player in the infrastructure segment, representing a meaningful update to the inference hardware landscape (novelty 2) with potential cross-segment impact on model deployment costs and stack control (significance 3).
OpenAI and Broadcom debut custom AI inference chip 'Jalapeno' in nine-month design cycle
On June 24, 2026, OpenAI and Broadcom announced the first custom-designed AI inference chip, named 'Jalapeno,' optimized for large language model inference. The chip, described as the first step in a multi-year roadmap, was designed and manufactured in just nine months using Broadcom's silicon expertise and Celestica's manufacturing capabilities. OpenAI stated that the chip is already running production inference workloads, including GPT-5.3-Codex-Spark, though specific performance and power figures were not disclosed.
Why it matters: The 'Jalapeno' chip signals a decisive move by OpenAI to vertically integrate its inference stack, reducing reliance on third-party GPU suppliers and potentially lowering inference costs significantly. This aligns with the 'hyperscaler vertical integration' pattern seen across top-tier AI labs, where control over silicon, model architecture, and runtime is consolidated to improve performance and margin. The nine-month design cycle also marks a notable acceleration in custom chip development, compressing what typically takes 18–24 months, and updates the competitive landscape for AI infrastructure players.
Expert take: This development places OpenAI in direct competition with other AI labs and hyperscalers that have pursued custom silicon, such as Google's TPU and Anthropic's reported chip partnership. The rapid turnaround—from design to production in under a year—suggests a mature design methodology and tight collaboration between OpenAI and Broadcom. If the chip yields the claimed 'dramatic improvement' in inference efficiency, it could pressure GPU incumbents like Nvidia and accelerate the industry's shift toward purpose-built inference hardware. However, scaling production across multiple facilities and proving reliability at high volume remain open questions.


