rlhf | near.blog

Personality basins are a mental model that I use to reason about humans within their environment: from modelling why people are they way they are, how they change over time, how mental illnesses and addiction function along with how we should look for their cures, and how the attention economy optimizes itself to consume all of your free time.

What is personality?

Note: This post contains many analogies to concepts from deep learning. Please do not interpret these comparisons too literally!

Your personality is formed by a process conceptually similar to RLHF. You are first born with a set of traits in a given environment. After this, you perform many interactions with your environment. If an interaction goes well, you’re likely to do it more often, and if it goes poorly, you’ll probably do less of it.

See the learning agent? That’s You!
If your interaction with this post goes well, you’re likely to read more of them later.

If you were born tall and with a commanding voice you might find that you get what you want by confidently demanding it, and this will help to result in a confident personality. If you attempt this strategy as someone born small with a soft voice, it will probably have weaker results and encourage you to try something else out instead.

Genetics has a large influence on most traits including personality. This topic is outside of the scope of this post: it is best to think of this post as providing some scaffolding for the question of “why is this person X, when they could have instead been Y if they had been in a different environment” or “what helps to explain the differences in outcome between two genetically identical people” (see also: niche construction and gene-environment correlation).

Periods of high social and environmental entropy during adolescence are the most formative because you will learn the most information about which actions perform well in your environment and which don’t (of course, our meta-learning algorithm knows this, and this is why you have higher neuroplasticity and thus a higher learning rate and more energy during this period. It’s time to learn how to succeed in your newfound environment!)

Your personality basin

As you go about your life, you will continue to modify your personality in response to your environment, and eventually you will end up in something that resembles a basin. Maybe you were born tall and attractive and then this led you to engage in a lot of athletic activities and socialization, and at the end of all of the positive feedback you have ended up with a jock personality that goes on to become a professional football player.

This is a landscape of personalities. The black line is your personality over time, and the last point is the person you currently are. Just like in machine learning, the way that you’ve progressed as a person has been by trying out many things and then doing more of the things that worked well.

If instead you grew up scrawny yet intelligent you might have found things go well for you when you adopt a more quiet persona and focus on solving technical problems in programming or mathematics, perhaps eventually leading to a career as a software engineer or academic. Just like training a model in machine learning, the general gist is that you will try out a lot of things and then do more of the things that went well.

The above image is of a loss landscape in machine learning. Since we are discussing personality, all of the points on the landscape represent different personalities you could have, with the lower points being personalities which are more successful. The personality basin that you find yourself in solidifies over time as you find out who you are and choose your friend group, career path, social and aesthetic preferences, and so on.

Most personality changes are unconscious

Most of your movement within personality-space happens outside of your conscious awareness. Although there are many times in life you’ll consciously decide to act in a certain way, this is the exception, not the norm. Your brain is always making millions of gradient updates a day based on what is and isn’t going well and often the most you can do is try to be as observant as possible. This is why techniques like nonviolent communication, dialectical behavior therapy, and mindfulness have observation and introspection as a core facet, because it’s something that you have to consciously practice to become good at rather than something you’re born with.

Most addictive behaviors start without us noticing what is happening until we are sufficiently addicted such that the habit is hard to break. Relatedly, if you introspect on many seemingly-innate preferences you will often notice some of the environmental and social gradients that have helped shape them. An interesting thought experiment you can perform on yourself is to pick a random personality trait that you have and try to answer the questions “why am I like this? could I imagine a version of myself that is not like this, and if so, what happened differently to them?”

Many people think their music and fashion preferences are innate to them and are solely based off of how their favorite music sounds and their favorite outfits look. But if their most hated political party (or often in the case of adolescents, their parents) adopted the same aesthetic preferences, you can imagine they might start to literally like them less!

Your conscious experience of a stimuli is not dictated by a single-variable function f(stimuli), but rather f(stimuli, personality, environment), for broad definitions of ‘personality’ and ‘environment’. If you have a favorite song that your friend thinks sounds terrible, this is because they are literally experiencing it differently from you due to the latter two variables given to this function. They don’t think the thing that you hear sounds terrible, they think the thing that they hear sounds terrible, and it is probably very dissimilar from what you hear. The average conscious experiences of most people are likely wildly different from one another (see also: What Universal Human Experiences Are You Missing Without Realizing It). For more thoughts on the signaling, environmental, and self-deceptive aspects here I’d suggest reading about signaling theory and checking out The Elephant in the Brain by Robin Hanson and Kevin Simler.

How do you know if you’re in the right basin?

If you’re reading this you probably have a vague idea of what type of personality basin you’re currently in which you can recall by asking yourself the question “What type of person am I?” But an important question remains: how can you find out if this is the right basin to be in?

A simple answer would be that you could try out other basins to see how they feel. Maybe you’re having a great life as a devops programmer, but you could try to become an artist or a woodworker or a stay-at home parent and see how that fares for you.

The reason why this is hard is that the optimal personality for this basin is not immediately accessible to you – to truly test optimality you will need to go through a full RLHF process. If you want to know how good of a life you’d have as a professional pianist, you will have to practice the instrument for a decade to find out.

You may wonder if you could simply try your hand at the piano for a month or two and see how it goes, and of course you can do this too. Your time (and your meta-learning algorithm’s number of epochs and learning rate) is limited, and it’s reasonable to make the trade-off of sacrificing depth-first search in favor of more breadth-first search.

As you progress in life, you will usually perform less exploration for new personalities and more exploiting with your developed personality

Usually this breadth-first search of trying out many different and creative strategies for life (prioritizing exploration over exploitation) automatically happens during your adolescence, but one of the magic things about the modern world is that there are so many societies, cultures, countries, and fields of work one can move into, and for each different environment could exist a slightly-different-you which finds their own distinct personality that maximizes success. Had you been born as a hunter-gatherer or within the Roman Empire or in ancient China, you’d probably have ended up quite different as a person. Similarly, if you decide to move countries or communities or careers, the optimal-you-for-your-environment will change a lot too.

Personality-space is adversarial

One interesting thing to note about personality-space is that it is adversarial. Rather than a static training set to iterate through, your training data consists of other RL agents, many of which are other people, and all of whom want different things from you.

This is what leads to the concept of Personality Capture. Personality capture is when your environment RLHFs you into becoming a personality that benefits the agents around you rather than yourself.

If a school bully threatens to hurt you unless you do their homework for them, they are attempting to modify your RLHF process so that it results in an agent which is beneficial to them, hopefully resulting in someone who will always give in to their demands.

Those familiar with high school psychology will find high similarity with this concept and that of classical and operant conditioning as well as concept of a Skinner box. The attempted addition to these concepts here is that of modelling the personality as a reinforcement learning process and changes in personality as gradient updates, which then allow us to view personality-space as a high-dimensional area which will give us some interesting tools to think with. As the saying goes, a ll models are wrong, but some are useful.

Luckily for humans there exist many symbiotic equilibria where multiple parties can find mutually-beneficial feedback loops within the epochs of personality-space. Parent/child relationships, marriages, and best friends are often good examples of such a situation.

Personality Capture

It’s easy to become susceptible to various forms of personality capture when your environment changes. When asked why he isn’t on Twitter, Dario Amodei, CEO of Anthropic, responds to Dwarkesh Patel with:

I’ve just seen cases with a number of people I’ve worked with, where attaching your incentives very strongly to the approval or cheering of a crowd can destroy your mind, and in some cases, it can destroy your soul.

I’ve deliberately tried to be a little bit low profile because I want to defend my ability to think about things intellectually in a way that’s different from other people and isn’t tinged by the approval of other people.

Illustration of a monkey being personality captured by excessive twitter usage

Most people around you want to personality-capture you in some way. Your boss might want you to work harder, your children might want you to give them more attention, and political parties want you to vote for them. Some of these things will be beneficial for you as well, but it’s easy to get trapped into bad habits when your adversary is sufficiently motivated and intelligent (e.g. social media feeds).

One interesting way to frame personality capture is by combining it with the concept of attention economics. All of the apps on your phone want to turn you into the type of person that uses them all day because that is beneficial for their revenue models. In many cases this is mutually beneficial, but it’s nonetheless clear that the cat and mouse game is starting to favor the felines more and more over the last two decades as they have learned to perfect their craft of user acquisition, retention, and ARPU maximization.

As I discussed in where are the builders, the game becomes particularly skewed when there is a large difference in ability or judgement between counterparties, with one common example being children and adolescents. It’s easy to become personality-captured by minecraft or roblox at the age of 10 – such games are not only fun and addictive, but a child also has little understanding of the level of optimization their counterparty has put in to making sure that they remain a user for life. The reason it’s so hard to put your phone away is because it’s a battlefield of yourself versus thousands of intelligent and well-compensated engineers trying their hardest to ensure you do just the opposite.

How do I leave my personality basin?

Perhaps you have decided that you don’t like your personality basin. Maybe it used to be working out for you but no longer is, or maybe you’ve always been unhappy with it. Or maybe you just have reason to believe you’re trapped in a local maxima which is far inferior to the global one. What should you do?

The first thing you’ll want to do is to change your environment. If both you and your environment are a constant, you shouldn’t expect to end up in a different basin any time soon. For every new environment exists a new optimal-you, and the world offers many environments to choose from.

The second thing you’ll want to do is increase your learning rate. There are a lot of ways to do this. One interesting note is that your learning rate will automatically increase if your environment changes. This may be why so many people find they are able to be more thoughtful and creative while going on long walks in nature rather than sitting in a cubicle.

This is also a reason why it’s good to constantly be trying new things, because new things will likely involve new environments and new people. If you wonder why trying new things is hard, it is likely because this trait was more maladaptive in our ancestral environment than it is today, as we had less control over our surroundings in the past (If anything, we may have too many options in some cases of the present: our society is so large that defection from a group is less costly as you can simply find a new group to join afterwards. This seems to create challenging game-theoretic equilibria in match-making where commitment to a partner is devalued due to the ease of finding alternatives, the effects of which can be seen by how discontent much of the population is with dating apps).

A common mistake in life is to let your personality basin solidify too early. Your parents and schooling environment have a disproportionately large influence on who you become as an adolescent. But as soon as you gain the freedom to act independently as an adult, it’s usually a good idea to force yourself to try as many new things as you can, including moving cities (or countries!) and considering drastically different lines of work. Even if you feel content with where you are, the potential return is literally life-changing. Moving away from where I was born was one of my most important life choices, but it still took me several years longer than it should have to give it a shot.

Although you have a general learning rate curve for how quickly your personality adapts to a new environments, different stimuli will also be paired with differing gradient magnitudes. High-magnitude experiences which result in strong gradient updates can move you within personality space much more quickly.

Humans have many sets of learning rate curves which govern different parts of their brain. In addition to the baseline learning curve, our learning curves are heavily modified by our environment.

If someone uses a psychedelic drug which explicitly gives them high-magnitude gradients they will probably move a lot more in personality space than if they had stayed sober. Similarly if someone undergoes a highly traumatic event, it may push them a long distance within personality space as they quickly adapt to ensure that they don’t have to go through the same experience again. Both of these activities involve large gradient updates.

Common activities which seem to give the largest gradient updates to humans are meditation, drug usage, trauma, religious events, love, gambling, and sex.

Some of these concepts are more negatively-coded than others, for example trauma. But the intended purpose of trauma is obvious, which is to avoid really bad things from happening to you in the future. One of the reasons why overcoming trauma isn’t as hard-coded into us as strongly as we might hope for is because our present society is so much larger than that which we evolved in such that there’s more opportunity to change your environment as to remove the potential source of trauma. Trauma was likely more adaptive in our ancestral environment than it was today due to an inability to drastically change your surroundings and social group in the past.

This is why strong psychedelic drugs like ayahuasca can be dangerous: whatever happens to you during your experience will be fed to you via high-magnitude gradients. Because users may experience hallucinations and delusional thinking during usage of such drugs, it’s possible for their location in personality-space to be thrown far out-of-distribution and into an area which has little overlap with the rest of humanity (See also: Psychedelics reopen the social reward learning critical period; Ketamine: ~48 hours, Psilocybin/MDMA: ~2 weeks, LSD: ~3 weeks, Ibogaine: ~4 weeks).

This isn’t to say there can’t be high-magnitude positive outcomes as well, but just that there is a high potential for variance when large gradients are involved. Romantic love can be a similarly dangerous force and has pushed thousands to suicide, yet our society near-universally regards it as a good thing! While there are many other reasons for this, high-variance is not inherently bad and is likely necessary at the societal level in order to promote long-term antifragility (this is also the very reason I am so bullish on America).

Personality basins and mental illness

Personality basins are an interesting way to model many mental illnesses. Similar to attractor states or trapped priors, they allow us to have a simple model with which we can plan to manipulate in order to solve our problems. Just as your personality basin decides how introverted you are, how funny you are, and what type of music you enjoy, it also helps to curate which psychiatric conditions affect you.

One of the reasons why curing depression is so hard is because you need a very large gradient update to escape the basin you’re trapped in. This gradient update could come all at once via an excessively strong positive stimuli, for example a drug which explicitly increases your learning rate like ketamine. But this is often hard to reliably induce, and so the gradient updates instead usually have to be small and continual over a long period of time.

This is what most cognitive behavioral therapy techniques are: we find a simple way to make a small positive gradient update to push you ever-so-slightly out of the personality basin you’re trapped in, and then we keep doing it for months or years until we finally push you all the way out of the undesirable basin.

This is also a nice way to model something like drug addictions: drugs personality-capture you into a basin which feeds off of and depends on them, and this basin can become arbitrarily deep due to the high magnitude of gradients drugs can apply to you (and thus be very hard to escape from). The concept of relapsing on a drug is equivalent to falling back down to the bottom of the basin, and the concept of tapering off dosage over time is equivalent to providing small and continual gradient updates over time.

I have a lot of hot takes that society is collectively becoming so efficient at some forms of personality capture that we will end up inducing various psychiatric conditions in the majority of our population. Societies end up with their own hyperdimensional personality basins just as people do, and just like us, the two ways they can move out of their basin are either gradually via many slow updates (e.g. the industrial revolution), or all at once via a very strong update (e.g. the french revolution). It’s worth thinking about the effects that different types of memetic information may have on our society’s collective personality basins as we become more and more efficient at communication.

Can’t I be in multiple personality basins?

One thing you may notice from the above sections is that your personality appears much more malleable and dynamic than one described by a static point: you probably act differently around your family than you do around your friends or your co-workers.

To solve this discrepancy you can simply model personality space and your personality basin with additional dimensions, allowing you to model yourself not as a 1d point, but as a three-dimensional landscape.

I model my own personality basin with an extra dimension (i.e., 4d): at any given point in time there exists a “me” which implements a given personality landscape in a given personality basin, but I also have many sub-basins which implement my different moods. The set of actions I might perform when I’m angry is very different from that when I’m sad, and these are simply different sub-basins within the containing higher-dimensional basin. You could similarly increment the model’s dimensionality in order to model yourself using internal family systems or even dissociative identity disorder.

near.blog

Tag Archives: rlhf

Personality Basins