How Open Source is eating AI
lspace.swyx.io
How Open Source is eating AI
Why the best delivery mechanism for AI is not APIs
How Open Source is eating AI
lspace.swyx.io
Email
The GPT-2 cycle took 6 months:
May 2020: OpenAI releases GPT-3 as a paper and a closed beta API in June 2020 .
Jul 2020: EleutherAI forms as the truly open alternative to OpenAI
Sep 2020: Grants Microsoft “ exclusive license for GPT-3 ”
Jan 2021: EleutherAI releases The Pile , their 800GB dataset
Mar 2021: EleutherAI releases their GPT-Neo 1.3B and 2.7B models
May 2022: Meta announced OPT-175B for researchers ( with logbook !)
Jun 2022: Yandex releases YaLM-100B under an Apache-2 license
Jul 2022: HuggingFace releases BLOOM-176B under a RAIL license
The Text-to-Image cycle took 4? months:
Jun 2020: OpenAI blogs about Image GPT
Mar 2022: Midjourney launches its closed beta
Apr 2022: OpenAI announces DALL-E 2 with a limited “ research preview ”
May 2022: Google releases their Imagen paper (implemented in PyTorch in 3 days )
Jul 2022: DALL-E 2 available as an open beta (with waitlist) via OpenAI’s UI/API
Jul 2022: Midjourney also announces a fully-open beta via their Discord
Aug 2022: open source Stable Diffusion release, under OpenRAIL-M license
Sep 2022: OpenAI takes the waitlist off DALL-E 2
The timelines above are highly cherrypicked of course; the story is much longer if you take into account the longer development history starting from the academic papers for diffusion (2015) and transformer models (2017) and older work on GANs.
But what is more interesting is what has happened since: OpenAI’s audio-to-text model, Whisper, was released under MIT license in September with no API paywall. Of course, there is less scope for abuse in the audio-to-text domain, but more than a few people have speculated that the reception to Stable Diffusion’s release influenced the open sourcing decision.
Dreambooth: Community Take The Wheel
Sufficiently advanced community is indistinguishable from magic. Researchers and well funded teams have been very good at producing new foundational models (FM), but it is the open source community that have been very good at coming up with productized use cases and optimizing the last mile of the models.
The most quantifiable example of this happened with the recent Dreambooth cycle (finetuned text to image with few shot learning of a subject to insert in a scene).
Dreambooth is an attractive target for optimization because it doesn’t just involve downloading a model and running it, it also requires you to run finetuning training on your sample images, but the original port required so much memory that it was infeasible for most people to run in on their machines.
Well, that, and also the Corridor Digital guys made it go viral on YouTube:
Timeline in tweet form:
@natanielruizg announces Dreambooth\nSep 7: @XavierXiao releases StableDiffusion port, but requires 48GB VRAM\nSept 25: @CorridorDigital video 24GB\nSept 26: @shivamshrirao reduces it to 18GB\nSep 27: 12.5GB\nOct 1: 11GB\nToday: 10GB\n\nOSS > APIs","username":"swyx","name":"swyx ????","date":"Sun Oct 02 22:27:35 +0000 2022","photos":[],"quoted_tweet":{},"retweet_count":81,"like_count":697,"expanded_url":{},"video_url":null,"belowTheFold":true}">
697Likes81Retweets
Most of this optimization happened on GitHub between Xavier Xiao (a generative models and optimization PhD from Singapore working at AWS AI), and Shivam Shrirao (a Senior Computer Vision Engineer based in India), with help from Matteo Serva from Italy . Both were unaffiliated to the original Dreambooth team.
The low hanging fruit is gone, causing some to worry about diminishing returns , but some proofs of concept exist for getting Stable Diffusion itself down small enough to run on a phone ( down from 10GB and 5GB before - consumer cards have 6-12GB and iDevices have unified memory).
#stablediffusion locally on an iPhone XS. Not via server. On. My. Phone. The prompt: “pineapple on a white table.”Watch it transform. (I broke the model into about 20 CoreML models to avoid running out of memory ","username":"wattmaller1","name":"Matt Waller","date":"Sat Sep 24 20:18:49 +0000 2022","photos":[{"img_url":"https://substackcdn.com/image/upload/w_1028,c_limit,q_auto:best/l_twitter_play_button_rvaygk,w_88/x9ysy8oyadhajvfqh9aa","link_url":"https://t.co/FJcUNv7elO","alt_text":null}],"quoted_tweet":{},"retweet_count":520,"like_count":4079,"expanded_url":{},"video_url":"https://video.twimg.com/ext_tw_video/1573768861719183363/pu/vid/720x1280/2g5su4HyTwiAkFoX.mp4?tag=12","belowTheFold":true}">
This would probably be the holy grail of Open Source AI model optimization, because then image generation is effectively unconstrained by cloud economics and the profit motive.
What Open Source Does that Researchers Don’t
While Stable Diffusion arrived the latest out of the 3 new text-to-image models, there were lots of community advances that have helped Stable Diffusion leap far ahead of the competing image-to-text models Midjourney and DALL-E in terms of mindshare and applications.
This serves as a useful generalizable roadmap for how open sourcing other forms of AI ( music, biology, language models ) might create new opportunities.
In rough order of increasing technical skill required:
Improving documentation
Improving speed on Stable Diffusion by 50%
A fun but important tangent - most of this AI/ML stuff is written in Python, which is comically insecure as a distribution mechanism. This means the rise of “Open Source AI” will also come with increasing need for “Open Source AI Security”.
The Future of Open Source AI
This whole journey is reminiscent of how open source ate Software 1.0:
Version Control: From Bitkeeper to Git
Languages: From Java toolchain to Python, JavaScript, and Rust
IDEs: From [many decent IDEs] to VS Code taking >60% market share
Databases: From Oracle/IBM to Postgres/MySQL
Anders Hejlsberg, father of 5 languages from Turbo Pascal to TypeScript, famously said that no programming language will be successful in future without being open source. You can probably say the same for increasingly more of your stack .
It is tempting to conclude that the same sequence will happen in Software 2.0/3.0, but a few issues remain.
Issue 1: Economic Incentives
To the economics minded, the desire to release foundation models as open source is counterintuitive. Estimates for the cost of training GPT-3 run between $4.6 to $12 million, excluding staff costs and failed attempts. Even Stable Diffusion’s impressive $600k cost (Emad has hinted the real number is much lower ) isn’t something to sneeze at or give away without a plan for making back the investment.
Taking OpenAI’s trajectory of monetizing through APIs, everyone understood what the AI Economy was shaping up to look like (arguable if Research > Infra, I made it about parity but just work with me here):
But Stability AI’s stated goals as a non-economic actor is both pressuring down the economic value of owning proprietary Foundational Model research, and expanding the total TAM of AI overall:
This is known as Stan Shih’s Smiling Curve model of industry value distribution, also discussed widely by Ben Thompson .
The big shoe to drop is how exactly Stability intends to finance itself - the $100m Series A bought some time, but the ecosystem won’t really stabilize until we really know how Stability intends to make money.
Issue 2: Licensing
According to the most committed open source advocates, we’ve been using the word wrong in this entire essay. Strictly speaking, a project is only open source if it has one of the few dozen OSI approved licenses . Meanwhile, virtually none of the “open source AI” models or derivatives here have even bothered with a license, with sincere questions completely ignored:
https://github.com/breadthe/sd-buddy/discussions/20
Stable Diffusion itself was released with a new CreativeML Open RAIL-M license, which governs the model weights (the thing you spend $600k to obtain) with certain sections compatible with OSI-approved licenses. If you have ever dealt with legal departments and OSI people, you know that won’t fly and the opinions are mixed with no legal precedents to rely on.
StabilityAI has demonstrated seriousness that you are clear to use its products for commercial purposes, even publicly supporting Midjourney in using Stable Diffusion, but when the stakes are someday 1000x higher than this, the legal details start to matter.
OpenAI Whisper is the first instance I am aware of where model, weights, and code have all been released under a straightforward, “honest-to-god open source”, MIT license.
Issue 3: What gets “Open Sourced”?
OSI approval aside, another wrinkle we have intentionally ignored until the very end of this essay is the actual nature of what “open sourcing” even means.
In a typical Software 1.0 context, “open source” would mean that the codebase is open source, but not necessarily details around the infrastructure setup nor the data accumulated/operated on by the code. In other words, open code does not mean open infra nor open data (though in practice at least some rudimentary guide on how to self-host is expected though not required).
With Software 2.0 , the data collection becomes really important and starts to dominate the code (which is reduced to model architecture). Open datasets helped to train an entire generation of ML engineers, most notably powering Kaggle competitions . With semi-homomorphic encryption, you could even occlude the data to create systems like Numerai - not strictly open, but open enough that a bored data scientist might play with the fake numbers and make some side cash. Still, the norm was very much not to offer open weights, as that is the most expensive thing to train.
With Software 3.0 and known scaling curves due to Chinchilla , LLMs and FMs become onetime, large investments undertaken on a single large corpus behalf of humanity.
The “Open Source AI” movement is tackling it a few different ways:
Open Datasets: LAION-5B , and The Pile
Open Models: Usually released by research papers - if enough detail is given, people can reimplement them in the wild as happened with GPT3 and Dreambooth
Open Weights: This is the new movement begun by Stability AI
Open Interface: aka not just provided an API to call, as OpenAI had been doing with GPT3, but actually giving direct access to code so that users can modify and write their own CLIs, UIs and whatever else they wish.
Open Prompts: users (like Riley Goodside ) and researchers (like Aran Komatsuzaki ) sharing prompt technique breakthroughs that unlock latent abilities in the FM.
The exact order of these will vary based on substance of the advancement and contextual variation, but this feels right?
An Open Source AI Institute?
It is probably true that the Open Source Initiative is not set up to consider all these dimensions in “open source” AI, and the one of the most foundational initiatives for an open source AI culture is to create a credible standard with expectations, norms, and legal precedent. This is Hugging Face and Stability AI’s opportunity, but perhaps there have already been other initiatives doing so that I just havent come across yet.
Related Reads