The growth of AI depends on many factors, and a foundational factor is access to datasets to train LLMs for learning patterns and being able to ‘connect the dots’. Several players in AI train their LLMs using online datasets - books, news articles, and other sources - so that their LLM can more quickly ‘learn’ how to mimic human intelligence and generate new content. And lawsuits have been filed by dozens of entities who argue that access to their content infringes their copyrights.
Other entities have taken the tack of licensing their content to LLMs. The upside is, presumably, royalty revenue and additional protections (for both sides) that are built into the agreement. The potential downside is that the outcomes of the lawsuits could undermine the value of a license to content by determining that the access is fair use.
The interconnection between litigation and licensing certainly means that the scales will tip one way or the other based on what courts decide regarding fair use. A determination that LLM access to news articles, books, and other content constitutes copyright infringement will make licensing deals more valuable and prominent. And while the value of a license may diminish on a determination that unfettered access by the LLMs is fair use, I believe licensing will still play a significant role in the process of training the LLMs. More likely, courts may find some middle ground where fair use exists, albeit in a more constrained set of circumstances. That will only increase the value of a license.
Licenses do more than grant access to information and data, and informed parties will recognize that a collaborative approach - even if access to certain information is considered fair use - will benefit both sides.