Gavin Boulanger / @gavinsbaker:
As inference splits into pre-filling and decoding, Nvidia’s Groq deal could enable a “Rubin SRAM” variant optimized for ultra-low latency agentic reasoning workloads.— Nvidia is buying Groq for two reasons in my opinion. 1) Inference breaks down into pre-filling and decoding.
