LLMs are strangely-shaped tools

few hit AI apps exist because LLMs are what i’d call very “strangely-shaped” tools.

most tools are built with a specific purpose in mind like a screwdriver or a car.

but LLMs were something we stumbled upon by predicting text and playing with RL – we didn’t design the shape of them beforehand, we just let them naturally evolve into what the loss optimized for.

it’s clear they’re good at many things, but they are so strangely shaped that it’s easy to fall into traps when making products.

AI agents are a good example of a trap (for now), where it’s easy to spend months trying to perfect your scaffolding yet never quite reaching the level of reliability you’d hope for.

long-term memory implemented solely via RAG is another trap. it’s just tempting enough to try, but the results aren’t as good as they should be.

other common inadequacies include poor search, hallucinations, and high inference costs. but there’s a long list of subtle weaknesses which few tinkerers ever notice as well as many weaknesses (and strengths) which remain unfound.

much of the frontier of LLM posttraining is currently concerned with these inadequacies – wondering how we can mold these strangely-shaped LLMs we have grown into a slightly more suitable form for the problems we face.

this is hard, even for the major labs. as we slowly progress on it, i’d expect to continue to see most AI products attempt to solve the same problems via the same methods, further suffering from lack of distinctness both in performance and aesthetic, because they don’t have the right connection between research teams and product teams (or perhaps the right vision to begin with).

it’s telling that among the few recent consumer successes like midjourney or perplexity, competitors are hyper-focused on directly copying winners rather than exploring the vast new frontier of things which could be built instead. this makes sense because the frontier is strangely-shaped, as a result of the underlying catalyst itself being strangely-shaped.

it’s not uncommon for services to launch a feature literally called “AI” which is primarily composed of literal magic wands and glitter emoji simply because the product designer has no idea how to actually convey the intended experience to the user. 2024 is certainly not a year one would be fired for using too much AI.

I expect it to get more interesting later this year and especially in 2025, but it’s still been a surreal experience continually contrasting my day to day life in san francisco with that of the actual real world (note: SF is not real in this example).

the above is also relevant to some of the reasons i have longer agi timelines than i did a few years ago. agi is not a strangely-shaped tool. in fact, it is quite literally the opposite.

Originally posted via Twitter/X, but mirrored here for convenience.

If you liked this post you may be interested in the rest of my website too!