Why does everyone think DeepSeek is so much cheaper to run? Seems like people are conflating initial pricing with serving costs?

I'm seeing lots of news articles saying the "costs" are far lower than OpenAI, but all the data I see is just that the 1) training cost and 2) price is far lower. And everyone is comparing this with the cost of data centers to SERVE 300M+ weekly active user.

Is there data that shows that their costs to SERVE are actually lower? Or is this just an unsustainable price war like Uber (who operates at a loss for like 10 years and won).

EDIT: Thanks u/expertsage for the closest answer so far: Here is a comprehensive breakdown on Twitter that summarizes all the unique advances in DeepSeek R1.

  • fp8 instead of fp32 precision training = 75% less memory

  • multi-token prediction to vastly speed up token output

  • Mixture of Experts (MoE) so that inference only uses parts of the model not the entire model (~37B active at a time, not the entire 671B), increases efficiency

  • PTX (basically low-level assembly code) hacking in old Nvidia GPUs to pump out as much performance from their old H800 GPUs as possible

All these combined with a bunch of other smaller tricks allowed for highly efficient training and inference. This is why only outsiders who haven't read the V3 and R1 papers doubt the $5.5 million figure. Experts in the field agree that the reduced training run costs are plausible.

Edit: The final proof is all the independent third-party hosts in the US that are providing DeepSeek R1 on their servers (https://openrouter.ai/). Their costs for running the model match up with the V3 and R1 papers.