Nvidia's GPU business is not in question. What is in question is the incremental hardware layer Nvidia is now trying to sell on top of it. The Groq 3 LPX rack is a specialized decode accelerator that requires a separate purchase decision from customers who have already committed to Vera Rubin GPUs. That second check only gets written if the performance gap over general GPUs justifies it.
DeepSeek's DSpark is not an isolated release. It is the latest move in a pattern of open-source software innovations that directly shrink the decode bottleneck LPX is designed to monetize. MLA compression reduced KV cache requirements by over 90% across DeepSeek's model generations. DSpark cuts the number of full decode passes required per output. Both are free, MIT-licensed, and already spreading beyond DeepSeek's own model family.
Decode disaggregation is now an industry-wide architectural bet, not Nvidia's alone. AWS and Cerebras announced a competing disaggregated inference solution in March 2026, pairing Trainium for prefill with Cerebras CS-3 for decode, launching on Amazon Bedrock in the second half of 2026 on the same timeline as LPX. Nvidia's LPX attach rate over the next two to three quarters is the single most important signal for whether the consensus Data Center revenue numbers hold.
DeepSeek released DSpark last week. The market noticed the speed numbers and moved on. What the speed numbers actually signal is more important than the benchmark: A Chinese AI lab keeps finding ways to make inference faster using software and open weights, at no cost to anyone who wants to use it.
Nvidia (NASDAQ: NVDA), meanwhile, is ramping a specialized decode rack called the Groq 3 LPX that requires a separate purchase decision on top of the GPU platform customers already depend on. The question the market is not asking is whether that second check gets written at scale, or whether DSpark and the architectural innovations underneath it are quietly making the answer no.
Where to invest $1,000 right now? Our analyst team just revealed what they believe are the 10 best stocks to buy right now, when you join Stock Advisor. See the stocks »
Nvidia just posted the biggest quarter in semiconductor history. Revenue of $81.6 billion. Data Center revenue of $75.2 billion. GAAP gross margin of 74.9 percent. The established GPU business is not in question. What is in question is the incremental layer Nvidia is now trying to monetize on top of it.
The investment case has quietly shifted layers. Selling GPUs to hyperscalers is the established business. The new ambition is to sell a specialized decode rack alongside those GPUs, positioned as a required upgrade for the most demanding agentic AI workloads.
That rack is the Groq 3 LPX, built around 256 Groq LPU accelerators. Each LPU carries 500MB of on-chip SRAM running at 150 terabytes per second of bandwidth, roughly seven times the memory bandwidth of a Rubin GPU. Paired with the Vera Rubin NVL72 GPU system, Nvidia claims the combination delivers up to 35 times higher inference throughput per megawatt for trillion-parameter models. Vera Rubin is now in full production. LPX is shipping to early customers in the second half of 2026.
The pitch is compelling. The risk is that it requires a separate purchase decision from customers who have already committed to Rubin GPUs. DSpark arrived and made that decision harder.
LLM inference splits into two distinct phases. Prefill processes the input prompt and generates the initial memory state. Decode generates output tokens one step at a time, using that memory state under sustained pressure from active users, long outputs, and large context windows.
Decode is slower, more memory-intensive, and harder to scale efficiently. The memory structure at the center of this pressure is the KV cache, which grows with context length and must be read repeatedly for every generated token. For long-context agentic workloads, the KV cache can consume the majority of available GPU memory.
This is the bottleneck LPX is designed to monetize. If decode remains the dominant constraint on inference quality and cost, LPX becomes a necessary part of any serious agentic deployment.
Image source: Getty Images.
This is important context for evaluating LPX. The case for separating prefill and decode onto specialized hardware is not Nvidia's alone. It is the conclusion the entire industry has reached simultaneously, which validates the underlying thesis but complicates the investment case for LPX specifically.
In March 2026, AWS and Cerebras announced a multiyear collaboration that puts Cerebras CS-3 wafer-scale engines inside AWS data centers, pairing them with Trainium 3 for prefill and Cerebras CS-3 for decode, connected via Amazon's Elastic Fabric Adapter networking. The architecture is identical in logic to Nvidia's LPX play: prefill and decode require different silicon, and serving them on the same hardware leaves performance on the table. AWS described the result as delivering inference an order of magnitude faster than existing GPU-only solutions. The service is launching through Amazon Bedrock in the second half of 2026, on the same timeline as LPX.
The bear case sharpens here. If every major cloud provider reaches the same architectural conclusion and builds their own answer to it, Nvidia's LPX attach rate becomes a question of whether customers who buy Rubin GPUs also buy LPX as a second rack, or whether they route their most latency-sensitive decode workloads to a hyperscaler-native alternative instead.
DeepSeek released DSpark on June 27, 2026. It is not a new model. It is a speculative decoding module attached to DeepSeek-V4-Flash and V4-Pro, now running live in production.
The mechanism is worth understanding because it directly attacks the decode bottleneck. A smaller draft model proposes multiple tokens at once. The large target model verifies them in parallel. When the draft is right, multiple tokens are accepted in a single step. Fewer full decode passes are required per output. The memory and compute burden per token falls.
DeepSeek reports per-user generation speed improving 60% to 85% on V4-Flash and 57% to 78% on V4-Pro over the prior baseline. Throughput at a fixed service level improved 51%. These numbers come from DeepSeek's own benchmarking and have not been independently verified as of this writing. What is verifiable: DSpark is live in production, open-sourced under the MIT license, and the companion DeepSpec training framework already extends to Qwen and Gemma model families. The efficiency gains are not staying inside DeepSeek's own ecosystem.
One nuance worth stating plainly. Nvidia's own LPX architecture supports speculative decoding. Dynamo is designed to orchestrate draft-and-verify workflows across the GPU-LPU combination. DSpark and LPX are not simply in opposition. The more pointed bear case is that DSpark running on general Rubin GPUs alone, without LPX attached, delivers enough inference efficiency that the second rack becomes optional for most workloads.
DSpark is also not the first move in this pattern. DeepSeek has been quietly shrinking the memory problem that decode hardware is designed to solve. Its MLA architecture, carried through every model generation since V2, stores a compressed representation of past context instead of the full memory state a standard model would keep. The practical result: DeepSeek-V4-Pro needs roughly 10% of the memory that V3.2 required for million-token conversations. Less memory pressure means less urgency for hardware whose main selling point is handling that pressure. DeepSeek is reducing the problem inside the model before the hardware ever sees it.
The causal chain for investors is this. Decode is a memory and latency problem. LPX is a hardware solution to that problem. DSpark and MLA are software solutions to the same problem. They are open, free, and already in production.
This is not a story about Nvidia losing the AI infrastructure market. Hyperscalers have already secured Vera Rubin allocations. Nvidia CEO Jensen Huang confirmed more than $1 trillion in combined Blackwell and Rubin purchase orders through 2027. That figure covers GPU systems and associated networking. It does not include LPX racks, Vera CPU systems, or storage, all of which are incremental.
The trade-off is specific. LPX must deliver enough guaranteed latency and throughput improvement over general Rubin GPUs running DSpark-style inference to justify a separate rack purchase, and to do so while competing against hyperscaler-native decode alternatives that carry none of LPX's integration friction. That hurdle just got higher on two fronts simultaneously: software efficiency is rising and the competitive field for specialized decode hardware is widening.
Geopolitical restrictions protect some hardware supply but not the ideas. DSpark is MIT-licensed and already running on model families beyond DeepSeek's own.
Token volume can rise while hardware intensity per token falls. That is the asymmetry that the market is not pricing.
The forward numbers require one specific question. Full-year fiscal 2027 Data Center revenue consensus sits near $343 billion, per S&P Global Visible Alpha. That implies continued sequential growth through the Vera Rubin ramp. The consensus Data Center gross margin for fiscal 2027 is projected at 76.3%, slightly below fiscal 2024 and 2025 levels.
The LPX bear case does not threaten GPU demand. It threatens the incremental attach: LPX racks, disaggregated serving infrastructure, and the premium networking and storage configurations justified by worst-case decode loads. If LPX lands narrowly among the highest-concurrency, longest-context deployments rather than broadly across agentic workloads, the incremental revenue layer implied by the consensus trajectory is harder to underwrite at current multiples.
This is what would make the bear case wrong: LPX becomes a required component for agentic deployments, not a premium option. Cloud providers report service-level improvements only achievable with LPX attached to Rubin and disclose this publicly. Nvidia reports LPX rack demand separately from general Rubin GPU demand in upcoming earnings calls, with numbers large enough to matter.
The AWS-Cerebras disaggregation stack proves difficult to scale or faces latency limitations from the EFA interconnect between Trainium and CS-3, pushing customers toward the tighter GPU-LPU co-design of LPX. DSpark-style speculative decoding shows weak acceptance rates in production reasoning and complex agentic workflows, where output is less predictable. KV compression architectures prove difficult to extend beyond DeepSeek's model family. The CUDA compatibility gap closes with the LP35 generation, removing a meaningful adoption friction point.
Here's what would make the bear case right: Nvidia discusses Vera Rubin demand broadly in upcoming earnings without evidence of LPX attach rates. Hyperscalers deploy DSpark-style speculative decoding and MLA-derived attention at scale, with infrastructure reporting showing lower hardware intensity per token. The AWS-Cerebras Bedrock launch gains rapid enterprise adoption, demonstrating that customers route latency-sensitive decode workloads to hyperscaler-native alternatives rather than LPX. Token prices fall faster than Nvidia's hardware cost reductions, compressing customer payback periods for LPX investment. DeepSpec-style efficiency gains appear in Qwen, Gemma, and other major open model families, making this about architectural diffusion rather than one Chinese vendor.
Nvidia's GPU platform is not in question. The Blackwell ramp was real. The Vera Rubin orders are real. The agentic AI inflection Jensen Huang describes is real.
What is in question is whether the inference specialization layer gets purchased at the scale the consensus numbers imply. The bear case is not that inference stops growing. It is that DeepSeek and its open-source successors keep compounding software efficiency, that hyperscalers build their own decode alternatives, and that all of this happens at the exact moment Nvidia is trying to monetize a hardware solution to the same problem.
The decisive metric over the next two to three quarters is not Data Center revenue. It is whether Nvidia can show that Vera Rubin customers attach LPX because neither general Rubin GPUs running DSpark-style inference nor hyperscaler-native disaggregation alternatives can meet their latency targets at scale. That signal will either underwrite the consensus or put it in doubt. Everything else in the Nvidia story is already priced.
Before you buy stock in Nvidia, consider this:
The Motley Fool Stock Advisor analyst team just identified what they believe are the 10 best stocks for investors to buy now… and Nvidia wasn’t one of them. The 10 stocks that made the cut could produce monster returns in the coming years.
Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you’d have $397,890!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you’d have $1,196,664!*
Now, it’s worth noting Stock Advisor’s total average return is 902% — a market-crushing outperformance compared to 207% for the S&P 500. Don't miss the latest top 10 list, available with Stock Advisor, and join an investing community built by individual investors for individual investors.
See the 10 stocks »
*Stock Advisor returns as of June 30, 2026.
Beegee Alop has positions in Amazon and Nvidia. The Motley Fool has positions in and recommends Amazon and Nvidia. The Motley Fool has a disclosure policy.