Today we are announcing a competition with prizes ranging from $15k to $1.5M for work that informs the Future Fund’s fundamental assumptions about the future of AI, or is informative to a panel of superforecaster judges. These prizes will be open for three months—until Dec 23—after which we may change or discontinue them at our discretion. We have two reasons for launching these prizes.
First, we hope to expose our assumptions about the future of AI to intense external scrutiny and improve them. We think artificial intelligence (AI) is the development most likely to dramatically alter the trajectory of humanity this century, and it is consequently one of our top funding priorities. Yet our philanthropic interest in AI is fundamentally dependent on a number of very difficult judgment calls, which we think have been inadequately scrutinized by others.
As a result, we think it’s really possible that:
If any of those three options is right—and we strongly suspect at least one of them is—we want to learn about it as quickly as possible because it would change how we allocate hundreds of millions of dollars (or more) and help us better serve our mission of improving humanity’s longterm prospects.
Second, we are aiming to do bold and decisive tests of prize-based philanthropy, as part of our more general aim of testing highly scalable approaches to funding. We think these prizes contribute to that work. If these prizes work, it will be a large update in favor of this approach being capable of surfacing valuable knowledge that could affect our prioritization. If they don’t work, that could be an update against this approach surfacing such knowledge (depending how it plays out).
The rest of this post will:
On our areas of interest page, we introduce our core concerns about AI as follows: We think artificial intelligence (AI) is the development most likely to dramatically alter the trajectory of humanity this century. AI is already posing serious challenges: transparency, interpretability, algorithmic bias, and robustness, to name just a few. Before too long, advanced AI could automate the process of scientific and technological discovery, leading to economic growth rates well over 10% per year (see Aghion et al 2017, this post, and Davidson 2021). As a result, our world could soon look radically different. With the help of advanced AI, we could make enormous progress toward ending global poverty, animal suffering, early death and debilitating disease. But two formidable new problems for humanity could also arise: Loss of control to AI systems Advanced AI systems might acquire undesirable objectives and pursue power in unintended ways, causing humans to lose all or most of their influence over the future. Concentration of power Actors with an edge in advanced AI technology could acquire massive power and influence; if they misuse this technology, they could inflict lasting damage on humanity’s long-term future. For more on these problems, we recommend Holden Karnofsky’s “Most Important Century,” Nick Bostrom’s Superintelligence, and Joseph Carlsmith’s “Is power-seeking AI an existential risk?”. Here is a table identifying various questions about these scenarios that we believe are central, our current position on the question (for the sake of concreteness), and alternative positions that would significantly alter the Future Fund’s thinking about the future of AI: “P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI AGI will be developed by January 1, 2043 AGI will be developed by January 1, 2100 Future Fund will award a prize of $500k to anyone that publishes analysis that moves these probabilities to the lower or upper prize threshold. To qualify, please please publish your work (or publish a post linking to it) on the Effective Altruism Forum, the AI Alignment Forum, or LessWrong with a “Future Fund worldview prize” tag. We will award larger prizes for larger changes to these probabilities, as follows: $1.5M for moving “P(misalignment x-risk|AGI)” below 3% or above 75% $1.5M for moving “AGI will be developed by January 1, 2043” below 3% or above 75% We will award prizes of intermediate size for intermediate updates at our discretion. A $200k prize for publishing any significant original analysis4 which we consider the new canonical reference on any one of the above questions, even if it does not move our current position beyond a relevant threshold. Past works that would have qualified for this prize include: Yudkowsky 2008, Superintelligence, Cotra 2020, Carlsmith 2021, and Karnofsky’s Most Important Century series. (While the above sources are lengthy, we’d prefer to offer a prize for a brief but persuasive argument.) A $200k prize for publishing any analysis which we consider the canonical critique of the current position highlighted above on any of the above questions, even if it does not move our position beyond a relevant threshold. Past works that might have qualified for this prize include: Hanson 2011, Karnofsky 2012, and Garfinkel 2021. At a minimum, we will award $50k to the three published analyses that most inform the Future Fund’s overall perspective on these issues, and $15k for the next 3-10 most promising contributions to the prize competition. (I.e., we will award a minimum of 6 prizes. If some of the larger prizes are claimed, we may accordingly award fewer of these prizes.) As a check/balance on our reasonableness as judges, a panel of superforecaster judges will independently review a subset of highly upvoted/nominated contest entries with the aim of identifying any contestant who did not receive a prize, but would have if the superforecasters were running the contest themselves (e.g., an entrant that sufficiently shifted the superforecasters’ credences). For the $500k-$1.5M prizes, if the superforecasters think an entrant deserved a prize but we didn’t award one, we will award $200k (or more) for up to one entrant in each category (existential risk conditional on AGI by 2070, AGI by 2043, AGI by 2100), upon recommendation of the superforecaster judge panel. For the $15k-200k prizes, if the superforecasters think an entrant deserved a prize but we didn’t award one, we will award additional prizes upon recommendation of the superforecaster judge panel.
Only original work published after our prize is announced is eligible to win. We do not plan to read everything written with the aim of claiming these prizes. We plan to rely in part on the judgment of other researchers and people we trust when deciding what to seriously engage with. We also do not plan to explain in individual cases why we did or did not engage seriously. If you have questions about the prizes, please ask them as comments on this post. We do not plan to respond to individual questions over email. All prizes will be awarded at the final discretion of the Future Fund. Our published decisions will be final and not subject to appeal. We also won’t be able to explain in individual cases why we did not offer a prize. Prizes will be awarded equally to coauthors unless the post indicates some other split. At our discretion, the Future Fund may provide partial credit across different entries if they together trigger a prize condition. If a single person does research leading to multiple updates, Future Fund may—at its discretion—award the single largest prize for which the analysis is eligible (rather than the sum of all such prizes). We will not offer awards to any analysis that we believe was net negative to publish due to information hazards, even if it moves our probabilities significantly and is otherwise excellent. At most one prize will be awarded for each of the largest prize categories ($500k and $1.5M). (If e.g. two works convince us to assign < 3% subjective probability in AGI being developed in the next 20 years, we’ll award the prize to the most convincing piece (or split in case of a tie).) For the first two weeks after it is announced—until October 7—the rules and conditions of the prize competition may be changed at the discretion of the Future Fund. After that, we reserve the right to clarify the conditions of the prizes wherever they are unclear or have wacky unintended results. Please be careful not to publish information that would be net harmful to publish. We think people should not publish very concrete proposals for how to build AGI (if they know of them), and or things that are too close to that. If you are worried publishing your analysis would be net harmful due to information hazards, we encourage you to a) write your draft and then b) ask about this using the “REQUEST FEEDBACK” feature on the Effective Altruism forum or LessWrong pages (appears on the draft post page, just before you would normally publish a post, see here).
Some clarifications and answers to anticipated questions What do you mean by AGI? Imagine a world where cheap AI systems are fully substitutable for human labor. E.g., for any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less. This includes entirely AI-run companies, with AI managers and AI workers and everything being done by AIs. How large of an economic transformation would follow? Our guess is that it would be pretty large (see Aghion et al 2017, this post, and Davidson 2021), but – to the extent it is relevant – we want people competing for this prize to make whatever assumptions seem right to them. For purposes of our definitions, we’ll count it as AGI being developed if there are AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in such a world. Some caveats/clarifications worth noticing: A comparably large economic transformation could be achieved even if the AI systems couldn’t substitute for literally 100% of jobs, including providing emotional support. E.g., Karnofsky’s notion of PASTA would probably count (though that is an empirical question), and possibly some other things would count as well. If weird enough things happened, the metric of GWP might stop being indicative in the way it normally is, so we want to make sure people are thinking about the overall level of weirdness rather than being attached to a specific measure or observation. E.g., causing human extinction or drastically limiting humanity’s future potential may not show up as rapid GDP growth, but automatically counts for the purposes of this definition. Why are you starting with such large prizes? We really want to get closer to the truth on these issues quickly. Better answers to these questions could prevent us from wasting hundreds of millions of dollars (or more) and years of effort on our part. We could start with smaller prizes, but we’re interested in running bold and decisive tests of prizes as a philanthropic mechanism. A further consideration is that sometimes people argue that all of this futurist speculation about AI is really dumb, and that its errors could be readily explained by experts who can’t be bothered to seriously engage with these questions. These prizes will hopefully test whether this theory is true. Can you say more about why you hold the views that you do on these issues, and what might move you? I (Nick Beckstead) will answer these questions on my own behalf without speaking for the Future Fund as a whole. For “Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI.” I am pretty sympathetic to the analysis of Joe Carlsmith here. I think Joe’s estimates of the relevant probabilities are pretty reasonable (though the bottom line is perhaps somewhat low) and if someone convinced me that the probabilities on the premises in his argument should be much higher or lower I’d probably update. There are a number of reviews of Joe Carlsmith’s work that were helpful to varying degrees but would not have won large prizes in this competition. For assigning odds to AGI being developed in the next 20 years, I am blending a number of intuitive models to arrive at this estimate. They are mostly driven by a few high-level considerations: I think computers will eventually be able to do things brains can do. I’ve believed this for a long time, but if I were going to point to one article as a reference point I’d choose Carlsmith 2020. Priors that seem natural to me (“beta-geometric distributions”) start us out with non-trivial probability of developing AGI in the next 20 years, before considering more detailed models. I’ve also believed this for a long time, but I think Davidson 2021‘s version is the best, and he gives 8% to AGI by 2036 through this method as a central estimate. I assign substantial probability to continued hardware progress, algorithmic progress, and other progress that fuels AGI development over the coming decades. I’m less sure this will continue many decades into the future, so I assign somewhat more probability to AGI in sooner decades than later decades. Under these conditions, I think we’ll pass some limits—e.g. approaching hardware that’s getting close to as good as we’re ever going to get—and develop AGI if we’re ever going to develop it. I’m extremely uncertain about the hardware requirements for AGI (at the point where it’s actually developed by humans), to a point where my position is roughly “I dunno, log uniform distribution over anything from the amount of compute used by the brain to a few orders of magnitude less than evolution.” Cotra 2020—which considers this question much more deeply—has a similar bottom line on this. (Though her updated timelines are shorter.) I’m impressed by the progress in deep learning to the point where I don’t think we can rule out AGI even in the next 5-10 years, but I’m not impressed enough by any positive argument for such short timelines to move dramatically away from any of the above models.. (I’m heavily citing reports from Open Philanthropy here because a) I think they did great work and b) I’m familiar with it. I also recommend this piece by Holden Karnofsky, which brings a lot of this work—and other work—together.) In short, you can roughly model me as having roughly trapezoidal probability density function over developing AGI from now to 2100, with some long tail extending beyond that point. There is about 2x as much weight at the beginning of the distribution as there is at the end of the century. The long tail includes a) insufficient data/hardware/humans not smart enough to solve it yet, b) technological stagnation/hardware stagnation, and c) reasons it’s hard that I haven’t thought of. The microfoundation of the probability density function could be: a) exponentially increasing inputs to AGI, b) log returns to AGI development on the key inputs, c) pricing in some expected slowdown in the exponentially increasing inputs over time, and d) slow updating toward increased difficulty of the problem as the time goes on, but I stand by the distribution more than the microfoundation. What do you think could substantially alter your views on these issues? We don’t know. Most of all we’d just like to see good arguments for specific quantitative answers to the stated questions. Some other thoughts: We like it when people state cleanly summarizable, deductively valid arguments and carefully investigate the premises leading to the conclusion (analytic philosopher style). See e.g. Carlsmith 2021. We also like it when people quantify their subjective probabilities explicitly. See e.g. Superforecasting by Phil Tetlock. We like a lot of the features described here by Luke Muehlhauser, though they are not necessary to be persuasive. We like it when people represent opposing points of view charitably, and avoid appeals to authority. We think it could be pretty persuasive to us if some (potentially small) group of relevant technical experts arrived at and explained quite different conclusions. It would be more likely to be persuasive if they showed signs of comfort thinking in terms of subjective probability and calibration. Ideally they would clearly explain the errors in the best arguments cited in this post. These are suggestions for how to be more likely to win the prize, but not requirements or guarantees. Who do we have to convince in order to claim the prize? Final decisions will be made at the discretion of the Future Fund. We plan to rely in part on the judgment of other researchers and people we trust when deciding what to seriously engage with. Probably, someone winning a large prize looks like them publishing their arguments, those arguments getting a lot of positive attention / being flagged to us by people we trust, us seriously engaging with those arguments (probably including talking to the authors), and then changing our minds. Are these statistically significant probabilities grounded in detailed published models that are confirmed by strong empirical regularities that you’re really confident in? This is a consequence of the conception of subjective probability that we are working with. As stated above in a footnote: “We will pose many of these beliefs in terms of subjective probabilities, which represent betting odds that we consider fair in the sense that we’d be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them.” For more on this conception of probability I recommend The Logic of Decision by Richard Jeffrey or this SEP entry. Applicants need not agree with or use our same conception of probability, but hopefully these paragraphs help them understand where we are coming from. Why do the prizes only get awarded for large probability changes? We think that large probability changes would have much clearer consequences for our work, and be much easier to recognize. We also think that aiming for changes of this size is less common and has higher expected upside, so we want to attract attention to it.
[1]: We will pose many of these beliefs in terms of subjective probabilities, which represent betting odds that we consider fair in the sense that we’d be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them. [2]: For the sake of definiteness, these are Nick Beckstead’s subjective probabilities, and they don’t necessarily represent the Future Fund team as a whole or its funders. [3]: It might be argued that this makes the prize encourage people to have views different from those presented here. This seems hard to avoid, since we are looking for information that changes our decisions, which requires changing our beliefs. People who hold views similar to ours can, however, win the $200k canonical reference prize. [4]: A slight update/improvement on something that would have won the prize in the past (e.g. this update by Ajeya Cotra) does not automatically qualify due to being better than the existing canonical reference. Roughly speaking, the update would need to be sufficiently large that the new content would be prize-worthy on its own.