Researchers, brace yourselves. The AI revolution isn't just knocking on our ivory tower's door—it's already taken up residence in our labs.
The AI Scientist: End-to-end automated research cycle
Sakana's AI Scientist is capable of generating research ideas, designing experiments, writing code, analyzing results, and even drafting full academic papers. While its output might not be winning Nobel Prizes yet, and is constrained to the field of machine learning1, it was demonstrated that some of its AI-generated papers would likely sneak past peer review. And why not? They were novel, if not groundbreaking, contributions.
From Silver Medals to Gold Standards
DeepMind just crashed the International Mathematical Olympiad, snagging a silver medal and nearly walking away with gold. This isn't your average high school math competition—it's the proving ground for the world's brightest young minds. And now, apparently, for silicon-based intellects as well.
IMO judge and Fields Medalist Sir Timothy Gowers in a moment of startling candor, told the BBC ( https://www.bbc.co.uk/sounds/play/m0021j07 - at about 29:00):
"I'm quite relieved to be at the end of my career... computers will be able to solve unsolved mathematical problems and when that happens it will be a very strange moment... I feel peculiar about my legacy if the things I did with so much effort become things that you can do in no time at all on your laptop."
When our most brilliant mathematicians can see, with clarity and conviction, the possibility of their impending obsolescence, we ought to pay attention.
OpenAI’s o1 – a ChatGPT moment for reasoning
Earlier this month, OpenAI released their o1 model, which employs a lighter variant of the reasoning techniques that likely powered DeepMind’s IMO entry. But unlike DeepMind’s proprietary and unreleased model, o1 is generally available. Within a week of its release, we saw:
A YouTube video showcasing o1 solving graduate-level physics problems in real-time.
UPenn Engineering professor Rob Ghrist having a "holy 💩" moment on Twitter:
“i’ve had my first "holy 💩" moment with GPT-o1 for proving theorems.... i am still checking it for correctness, but even if it's wrong, it's such a clever move, something like it is highly likely to work. i'm so very pleased: to have this degree of creativity and precision on tap is gonna make mathematicians deliriously happy...”Derya Unutmaz, professor at JAX, is even more bullish than AI salesman Sam Altman on applications of o1 to biomedical research:
“This may happen sooner than even @sama thinks. E.g., I’m about to test a cancer immunotherapy idea that OpenAI’s o1 model gave me last week! While it still requires significant time & effort to carry out the wet lab work, if it works, this would be an incredible AI collaboration!”
What’s notable is that, as in Ghrist’s case, the model is not churning through routine number crunching or doing some other boring procedural work. In both cases, the researchers are praising the model for generating novel ideas2.Fields Medalist Terence Tao, one of the most important and accomplished mathematicians of our time, put o1 through its paces. His verdict?
“ The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student.”Coming from Tao, that's high praise indeed.
The Now Irrefutable Reality: Machines Can Do Research
Machines can do real research. Right now. Not in some speculative future. No, they are not perfect; no, they cannot do everything. But there now exist demonstrated examples of machines generating novel ideas, proving theorems, and writing papers. The kind of work we've always thought required the ineffable human spark of creativity and intuition.
And what's truly mind-bending is the rate of progress. Tao notes that last year's AI model performed like an "actually incompetent graduate student." This year? A mediocre one. Unless you believe, against most expert opinion, that progress will suddenly halt, it doesn't take a Fields Medalist to extrapolate where we'll be next year, or the year after that.
Make no mistake, we're standing at a crossroads. Blockbuster Video clutched their VHS tapes and DVDs close to their chests because they refused to imagine a world where streaming technology replaced their offerings.
If we refuse to imagine a world in which parts3 of academic research are automated by technology, we choose to forgo agency in what happens next.
Universities have mostly lived with the belief that nothing other than humans could ever do research, and that technology is more likely to disrupt our teaching mission. Surprise.
The source code is open, so you’re welcome to adapt it to your field, with the likely limitation that any experimentation needs to be computational in nature. While one can see a straight line to, e.g., robotic labs, I’m not sure the technology there is ready yet, and it would surely slow down the generation time.
One can continue to scream “but it’s just autocomplete!” into the void of social media for as long as one must, but this will not change the reality that very serious researchers find the tool to be creative and capable of generating useful novel ideas. Of course it makes mistakes. Of course it is imperfect. But focussing on what it can’t do for the purposes of feeling smug and secure will lead to being outcompeted by colleagues who would rather focus on what it can do.
Especially some parts that we once thought were the unimpeachable domain of humans alone.