The Impact of Initial Data Set Tradeoffs In AI Products
Happy Sunday and welcome to Investing in AI. I’m Rob May, a partner at PJC investing in AI and robotics companies, most recently Deeplite. I also host the Investing in AI podcast. The most recent guest was Rana El Kaliouby, CEO of Affectiva. It’s a great episode if you have the time to listen.
— Interesting Links —
Integrating Knowledge Graphs With Pre-Trained Corpa. Google AI Blog.
Deepfake Satellite Imagery As A Threat. The Verge.
Balancing Performance Capacity and Budget For AI Training. Nextplatform.
How a Startup Beat Healthcare Heavyweights In An AI Contest. Statesman.
The State of AI in 15 Graphs. IEEE Spectrum.
— Research Papers —
Social Behavior Understanding Using Deep Neural Networks. Link.
Texture Generation With Neural Cellular Automata. Link.
Federated AI for Unified Credit Assessment. Link.
— Commentary —
In 2016 I was an angel investor in an AI company (which will remain nameless for now) with a strong “human in the loop” model. Several VCs called me to ask why I had invested and they all had the same complaint. “It’s just a services company. The gross margins are only 37%.” I argued for some vision for where it could go because, the company was collected data on what these humans did and was building a data set to automate many of their tasks.
Fast forward 4 years later and the company has 8 figure revenue, a 9 figure valuation, and 80% gross margins - like a software company. The model worked. Collecting the data took time. Now the business is more defensible than ever.
This particularly entrepreneur was able to push through and raise money because of his track record, despite the then-skepticism of his business model. But this issue about data has plagued many startups and projects at large companies as well.
In a world where data matters more than ever, companies large and small face a tradeoff. Most projects require some sort of early proof points to get and keep financing. Few can wait 4 years to show the return this startup did. But, defensibility in AI often comes from unique data sets, which take time to build.
Value in business usually comes from defensibility, but in our short-term-thinking world, few may have the patience to wait on AI defensibility. If you, as an entrepreneur, CEO, or product manager, have to make the tradeoff between early business traction and longer term defensibility, how do you think about the data aspect of that?
Ideally you can find a path that cuts through both - with some near term wins and a type of defensibility that grows linearly with your customer traction, but also starts to gain unique data and lay the groundwork for more unique, and defensible, data sets later on. But what do you do when you can’t?
Contrary to conventional wisdom that AI may be the “sport of kings,” I think this presents a unique opportunity for startups to disrupt some large company AI projects. While large companies have more data and more compute, most lack the patience to invest in longer term initiatives. If acquiring the data set to build something uniquely defensible takes several years, I think private capital, more than public markets, are more open to those business models and the timeframe required to build them.
Because investing in data, labeling, and training, all takes time, then by the time others realize it’s working in a certain area, it is too late to launch a competing project.
If you work for a big company, you should highlight for your team that the defensibility dividend pays very well. These investments, while slow to mature, are worth making. And if you are in a startup, your best bet of raising is to articulate the long term value of the defensbililty you are building.
Thanks for reading.
@robmay