On open and close machine learning models
In the ever-evolving landscape of artificial intelligence, it’s easy to assume that tech giants like OpenAI and Google hold a competitive advantage that will never fade away. With seemingly limitless resources, access to vast amounts of data, and the ability to train massive models, these companies have been at the forefront of AI innovation for years. However, a closer look reveals a possible shift in the dynamics of the AI field, suggesting that their dominance may not be as secure as believed.
A few weeks ago, SemiAnalysis published a leaked document from Google titled “We have no moat, and neither does OpenAI”. The document raised some very interesting questions about the future of large language models (LLMs) and artificial intelligence. While the source of the document remains undisclosed, SemiAnalysis claims that it was shared by an anonymous individual on a public Discord server, who has authorised its publication. Also, they state that the document originates from a researcher within Google, and they verified its authenticity.
Since its release, various sources, including the latent.space podcast, numerous websites and personal blogs, have delved into this document. While the source of the document is yet unverified, as well as the authenticity of the document itself, it is interesting to focus on its content rather than speculating about its origins. Indeed, regardless of who published it, the document conveys a significant message: open source LLMs could emerge in the near future as a superior alternative in the near future, offering advantages such as affordability, reliability, and trustworthiness when compared to models like GPT from OpenAI or Bard from Google. Indeed, OpenAI and Google are engaged in a race to build the most powerful (and expensive) LLMs, but their progress is being closely tailed by the open source community. While the models created by these tech giants currently maintain a quality advantage, there are indications that, “the gap is closing astonishingly quickly”.
The document presents a very interesting timeline that outlines the key milestones leading up to the current situation, where open-source models have the potential to compete with proprietary models from major companies.
Everything started when Meta launched LLaMA. They originally released the code but not the weights (i.e., not the actual model), most likely hoping to keep it “under control”.
The situation took a significant turn when the actual model was leaked to the public. This had a huge impact on the community, as it enabled everyone to experiment and tinker with the model.
In a matter of weeks, the efforts from the open-source community, significant effort was made. People were able to successfully run the model on diverse platform, such as i) a Raspberry Pi, ii) a laptop, and iii) a CPU.
In April, Berkeley introduced Koala, “a dialogue model trained entirely using freely available data. They take the crucial step of measuring real human preferences between their model and ChatGPT. While ChatGPT still holds a slight edge, more than 50% of the time users either prefer Koala or have no preference”.
Shortly after in April, Open Assistant launched a model and a dataset for alignment using Reinforcement learning from human feedback (RLHF). This model demonstrated remarkable similarity to ChatGPT in terms of human preference, with a narrow margin of 48.3% vs. 51.7%.
Why this is important
One of the main points highlighted in this document is that companies like Google and OpenAI do not have a special “secret sauce” (which the “moat” in the title refers to). Their competitive advantage and primary strength lies in their ability to train massive models with hundreds of billions of parameters. However, this advantage may not be sufficient in the long run. Indeed, the open source community has discovered that smaller models can sometimes achieve comparable accuracy to their larger counterparts, and they offer the added benefits of faster training and execution. This efficiency arises from building upon existing models rather than training from scratch each time, by simply adding new capabilities.
A somewhat similar thing happened with the DALL-E models anStable Diffusion for image generation from text. Due to the availability of Stable Diffusion without usage restrictions, it has taken over the market, while DALL-E has somewhat faded into the background. In the same way, it might happen that over the next months, open alternatives to models like GPT might begin to gain prominence, if not surpass them altogether. A prime example in this regard is the aforementioned Open Assistant.
Given the growing popularity of AI and the increasing user base of AI tools like GPT (for instance, it is recent news that Mercedes will integrate ChatGPT into the infotainment system of its cars), it is important to keep up to date with the news in this field. The objective of this post is precisely that: to raise awareness about the emerging alternatives to proprietary models that may not be fully ready at present but are likely to be soon. We hope you found this read enjoyable and informative!