2024 AI reflections

This is a mirror of a post I made on twitter here.


<effortPost>

2020-2025 AI releases which surprised me:

  • 2020: gpt-3, hifigan/stylegan2, alphafold 2
  • 2021: clip, dall-e 1
  • 2022: stable diffusion 1.5, tortoise-tts, chinchilla, gato
  • 2023: –
  • 2024: sonnet 3.6, veo2
  • 2025: ?

time for musings! note this excludes many research results to instead focus on model outputs which i didn’t think i’d see until several quarters later

it’s interesting to watch the patterns above: stylegan was my “wow we can make any image!” moment, but by the time i saw early previews of SD 1.5 i was never impressed by an image model again (even though e.g. midjourney’s custom models or flux are much better. i suppose red-hair hatsune miku was also a surprise for the four of you who were there, hah); it was obvious the field was “solved” and we simply needed to wait a bit longer

similarly, hifigan was a huge update in audio vocoding quality at the time and paired well with tts systems like tacotron 2 or later vits, but by the time i saw tortoise-tts also succeed with diffusion I realized “okay this field is solved now too. just scale and data, news at 10”. orca would have surprised me with its beauty (although never released, imo better than later internal jukeboxes) if I had never seen SD 1.5, but obviously if we can solve images we can solve music too, and now we have full companies like suno that do just that

videos aren’t too different, but i didn’t think a model as consistently performant as veo2 would hit for another 6-12 months. if you read the research history of the veo2 team i think it gives some hints as to why this may be the case (if anthropic wanted to do video i think they could have done it, but claude is simply not interested, and openai has more of a soft spot for shiny PR for raising and recruiting), but it’s great to receive reminders that google has near-infinite data and compute. you may think nvidia is worth a lot at three trillion dollars, well, did you know google (~$2.4T) has an nvidia inside of it (TPUs, for those of you reading from Home), and it’s actually quite good?

gpt-3 was really interesting; the communities i was in at the time consistently had a vibe of “why does no one care about this but us what is wrong with everyone” and this vibe continued for >2 full years (~2020-2022; from gwern’s pov May 2020 alone lasted for several long and slow years). as much as chatgpt may have taken over the world in some ways, i still feel that truly novel llm usage is only done by small groups of hackers here or there, with a few startups sometimes joining in. most obvious tricks spread faster via Twitter now but it’s still easy for anything to get lost in the noise. 2025 will probably have a lot of this propagation. sonnet 3.5+ impressed due to the quality of post-training and aesthetics (emergent properties are important!), but aside from that it’s still a data point on a very predictable line

one thing that did surprise me over the last few years was foss: the argument against was obvious (who is going to pay for it? every single ingredient for a good AI model favors heavy centralization), but even absent the nvidia strategy it only takes a single person with super-voting shares to change the world, and sometimes such a person makes a Decision. i expect raw foss models to continue to strongly lag behind in most areas, but theyre doing a bit better than i’d have guessed (honorable mentions for deepseek, quen, many others, but i expect meta to lead here for awhile)

so, what else is there?

we already have context windows in the millions of tokens, the models already know more facts than i do by a factor of 10,000 if not more. frontier models output tokens faster than i can type (i type very fast!), we can already generate ~perfect tts and music (if done properly), ~perfect images, and videos are clearly on the way. task-specific agents obviously work but are a bit tedious to set up (i expect MCP-esque usage to matter a lot in 2025), and broader mediocre agents aren’t that hard if you’re willing to build an entire company of proper scaffolding around them (but hey, skate to where the puck will be! this can be hard because there are many pucks: some of them will score you a goal, but others have a winning lottery ticket inside and others may explode upon contact. ymmv, so consider the equivalent of a zamboni prior to playing)

the course for robotics seems pretty simple and the amount of real capital pouring into this area is higher than ever before (which doesn’t mean the ML side is quick and easy at all, but rather it seems that we have all the building blocks we need. much of AI progress now is simply seeing the 10,000 ft mountain of Tedious Cumbersome Bullshit and deciding, yes, i will climb this mountain even if it takes years of effort, because the goal post is in sight, even if 10,000 ft above us (keep the thing the thing. also get high-quality climbing gear). i remember using this metaphor with one of the lab ceos almost two years ago now and there was a smirk at its aptness, but it is still just as true today. sometimes you can even jetpack up the mountain with synthetic data and distillation, where best practices haven’t traveled as far as one may expect (every time there is a breakthrough it takes quite awhile for the Others to notice for obvious reasons: the real stuff (generally) does not get published anymore. some think the labs will converge unto all the same techniques and models, and while many things like compute multipliers and architecture tricks may end up more fungible and/or be independently discovered by many parties, vibes matter a lot.

long-term adaptive/neural memory is trivial yet generally uneconomic and/or socially questionable to roll out with full vigor (i already have high empathy for lab inference leads and those on-call), tree search is easy and the fruits within reach from search are plentiful (maybe the hottest take i have here is i don’t know if o3 really matters or not without being able to see people use it on arbitrary problems in the wild (inference-time compute scaling is obviously a thing regardless)), and the final touches to bring agent reliability closer to humans likely aren’t actually that hard (you know, gdm never released a gato 2 paper either..). i have my own takes on what is missing to achieve ~real agi~ like most of us on this corner of twitter, but i’m still not particularly singularityPilled compared to the extent that i’m obviously moderatelyTransformativeAIPilled

it’s a crazy time to be alive though, the tech influencers du jour are correct on that at least! i’m reminded of this every time robots drive me to and from work while i lounge comfortably, casually chatting with AIs more knowledgeable than me on every stem topic in existence, before I get out and my hand-held drone launches to follow me for a few more blocks. afterwards I get bored and open twitter to post or giggle at a silly meme, as one does in the future. time still moves slowly for now, at least for me. while i don’t think we will be tweeting from space in five or ten years (well, a few of us may!), i do think everything will be vastly different; there will be robots and intelligence everywhere, there will be riots (maybe battles and wars!) and chaos due to more rapid economic and social change, maybe a country or two will collapse or re-organize, and the usual fun we get when there’s a chance of Something Happening will be in high supply (all three types of fun are likely even if I do have a soft spot for Type II Fun lately. humans don’t really change that much, at least not yet). I have no predictions on the timeframe of decades but i would not be surprised if predictions are no longer possible or worth making as a human, should such a species still exist in relative plenitude. i wouldn’t be the one to ask anyway you know.

the building blocks (and meta-building blocks) are all here (i strongly disagree that we need a ‘new transformer’-like magnitude innovation, but we could certainly use more creativity in this city), and it just takes many years for information to diffuse and people to adapt. most people are quite inactive, uncurious, and docile to whatever information most easily reaches them first and most younger internet-connected humans are already fully personality-captured by highly memetic and intelligent instances of this to begin with. i really love when i get to observe normal people somewhere far away from the heart of san francisco and see how infrequently they talk or care about ai (i don’t mean this in a disdainful tone at all – there is nothing wrong with assigning importance to your family and friends and hobbies while chopping wood and carrying water, and in some cases i’m even envious of how adept some are at masterfully combining these). most of the current trends seem pretty set-in-stone aside from whichever high-magnitude social disruptions we’ll get in 2025, which are ~impossible to predict the specifics of. humanity has found some levers we can pull on which work, and historically whenever we do that we get very, very good at it over the next few years, so of course we should expect llm speed and quality, infra buildout, power usage, etc etc, to all continue to be real and ‘relatively fast’ and even be under-realized by the public markets. but the disruptions and the Change, we will certainly get many instances of them, and if i could place a leveraged long on ‘chaos’ and ‘weirdness’, it would be the best trade of my life (VIX is not close to a match for this; i cant think up a simple yet coherent instrument that is yet). i’ll spend more time on concrete predictions later, but at least it seems like we’ll get an Interesting timeline, doesn’t it?

just some musings. happy to hear additions and strong disagreements as usual!

also! huge thank you to everyone who I’ve interacted with on this website in 2024. I learn so much, I have so much fun, and I owe a lot to so many of you. thank you!

i’ll close my opened xml tag now to satisfy the three of you that would otherwise be upset at such a transgression:

</effortPost>