RE: Deep Dive into DeepSeek AI Models
You are viewing a single comment's thread:
Excellent post with some details I hadn't noticed yet. Mixture of Experts seems like a common-sense approach. It was silly to let a language model answer math questions. Kind of funny that we're back to 'expert systems' though, which was a hype in the 1970s and a rather mundane type of software today.
optimizing costs of the AI models, which he was surprised none of the American corporations were doing
Maybe that's the weakness of Silicon Valley VC culture. Their whole game is raising tons of money and outspending the competition in order to gain market share. Cutting costs is for losers. In the OpenAI story, they have to be toiling on the Herculean task of achieving AGI for America, spending billions in order to earn trillions. I think a hedge fund's side-project is more in line with the actual economic profitability of LLMs in the long term.
I agree. I want to see how they deal with problems that require expertise in different domains.
I think both then and now they are trying to emulate expertise humans have in certain domains. There's certainly an improvement compared to old expert systems.😀
Yes, that's what I noticed too. But if DeepSeek can be run on local computers or smartphones for good-enough models (let's see that first), that's a hit to the VC culture. At least a temporary one, until the AI giants adjust their courses.
They may still reach AGI relatively soon. They spent a lot of money. I think they bet on the idea that if they are the first to reach AGI, no one will catch them again, and then they'll start getting tons of money back.
They had their constraints. I wonder if they wouldn't have chosen the same path as the American tech companies, if they could have had "unlimited" funds and resources (chips) at their disposal.