Why Deepseek Will Drive Innovation In Foundation Models
Basic economics now makes other foundation models more viable
Happy Sunday and welcome to Investing in AI. If someone sent you this email, and you want to get smart about investing in AI companies - please subscribe. If you want to listen to our podcast, you can find it here.
I’ve received dozens of questions since Deepseek rose to popularity a few weeks ago about what it means for OpenAI, foundation models, NVIDIA, and AI Hardware in general. The first order thinkers are saying things like “well, big companies were buying billions of dollars worth of GPUs to train these huge models and now they don’t need to, so NVIDIA is toast.” That’s as intellectually weak as most first order thinking usually is. While NVIDIA may soon start to see some pressure from competitors, Deepseek is actually a positive development for the company, and all AI hardware.
Think about it this way… when it cost $500M to train a model, there are certain things worth spending that to achieve - like pursuing AGI. When it cost $5M to train a model, now a whole bunch of other opportunities that were being ignored suddenly became ROI positive. When it gets to $500K to train an awesome model, we will see millions of awesome models.
As an investor, what does that mean?
It means there is now an opportunity for more types of foundation models. Now, when most people talk about “foundation models” they mean LLMs because in popular culture, LLMs have been the only foundation models we’ve really discussed. But that isn’t accurate. Foundation models can be built in lots of areas. The definition of a foundation model is just a large model that has a lot of expert functionality that can be a base building block for other things. There can be foundation models that aren’t LLMs.
Historically, I’ve been negative on foundation models as a business because I think when you spend $1B to train a model that is only defensible for a couple of months, that’s not a good use of capital. But I am changing my tune because of Deepseek and other innovations like test time compute. When you can train high performing models in niche areas for much lower prices, it opens up a whole new world of business opportunities.
As an example, I invested a few years ago in a company called Virgo. The company build the world’s largest collection of annotated endoscopy videos, mostly from colonoscopies. They used that data to train a gastrointestinal foundation model. What does that mean? It means the way you would use a LLM and pass it a phrase for sentiment analysis, you can pass an image to the gastro foundation model from Virgo and have it classify the colon polyps. That’s one simple use case but, these can be extrapolated to draw parallels with many of the use cases from LLMs.
I’m also a personal investor in Collectivei, which uses a bunch of B2B transaction data to build an economic foundation model. It can answer many questions about what is going on in the economy in near real time.
Why don’t we see more examples like this? Because foundation models have been expensive to train. As the cost gets down to under $1M, expect to see an explosion in foundation models for various niches.
As an investor, you should be prepared to both invest in those foundation models, and the applications that can be built on top of them. The first areas to look are areas with lots of data but lagging in compute skills and infrastructure, but where humans can benefit a lot from access to an expert model. I’m diving into law, tax, healthcare, and insurance initially.
Maybe someday compute becomes so cheap and commoditized that chip growth slows down but, I believe we are a long long way from that happening.
Thanks for reading. And if you want more info like this please subscribe!