Analogies offer a view of the unknown from the comforts of the known. Our experience shapes our representations of processes and relationships, and this representations in turn offers a form of their own for us to play and imagine with. Our attempt to grasp at the vastness of reality forms compact models that not only map back to the reality from which they were conceived, but offer glimpses of others.
We can formalize this notion of an analogy as any mutation to the mutual-information bits of a compressed program — a simple shortcut for discovering reasonable hypotheses about the world. Take a representation, vary one of its core theses, and see what happens.
The focus of AI research has so far rested solely on the world around us. We inspect it, try out different interactions with it, infer and reason about it, and hold fixed prior assumptions and axioms to help us traverse it. What we rarely look at is the self in this world, and what information can be gleaned from it.
There is an implicit prior assumption at the root of local optimization: wherever we find ourselves, it’s a good place to start. We’ll try out many different things around us and eventually, or so we hope, we’ll get somewhere better.
But there is more information in us to unlock. As we find ourselves existing and persisting in the world, we know that whatever it is we’re doing is working, at least so far, at least somewhat. There has to be something in us that captures and reflects — even by a complete happenstance — a pattern in the world that carries enough vitality to sustain us.
We can try to look at it as a program — a model, a hypothesis — and see if it helps us in discovering other such programs that might tell us where to go next or how to survive an unexpected change.
To do that we need to have some kind of compressed representation of ourselves, for otherwise we wouldn’t have the capacity to do something about it, only be it. We can use some fixed interpreter to unroll this representation into the complete state of ourselves. When we do that, we find that there are two very distinct types of bits of information in our compressed self-representation.
One only cares for itself. Flipping it will only change one corresponding bit in the unrolled representation. It can be used for local optimization: try out a small variation of movement, jiggle things slightly in the local neighborhood. The results are usually a bit worse, sometimes a bit better, but hardly ever drastically so, and usually only in aggregate and after a while.
The other kind, the hyperbit, contains mutual information: a more concentrated form of knowledge about the self program at hand. Flipping these bits can drastically affect the entire program and teleport the unrolled representation to faraway and opposite locations in the state space. This is the most basic form of an analogy: it is looking at a higher-level function and trying out different things in its domain, using it as a high-level vector of search.
Why does it matter? Why should these forays, guesses, and high-level shortcuts result in anything better than picking a new spot at random? It sounds counterintuitive to look at the bits that capture more of what you are doing — and doing good enough to survive so far — and think that this is the place to start shaking the system from. But this is also the only place where we can start forming a higher-level language that describes the space we find ourselves in.
Another way to look at it is through the lense of algorithmic information theory. Almost all possible programs are random noise. It’s very rare to find at random an interesting compact program, one with short Kolmogorov complexity. But here we see that interesting programs are connected in clusters. When we find one, we can reach many: two to the number of hyperbits in the compressed program.
Will all ‘Uber for X’ or ‘Stripe for Y’ work? Not by a long shot. But it does give us a promising and constrained space to look through. Maybe we’ll land on a possibility that’s even more exciting than the original. Maybe we’ll reach a more bullheaded one that’s more robust to an unexpected forthcoming change. Or maybe it will only give us the choice of a new outlook, a new base to form our representations from, so that we can continue our search in different ways.
At a higher level, these analogies can themselves be encoded in the self-representation. As they also must be compressed, they offer us a window to meta-analogies. This form of self-representation allows universality of expression and complete metaprogramming.
Compression acts both as a way to consolidate your standing in the local neighborhood of strategies and as a shortcut to other possibilities. In a dynamical environment — where a flood can come crashing unexpectedly to fill your local minima — it’s good to have these quick getaways to faraway places.
The implications of universal intelligence paint a very different picture of AGI than the superintelligence explosion commonly imagined. Intelligence everywhere is and will forever be inherently constrained by the same underlying limits humans endure. The growth of knowledge is an exponential curve — it always has been — but one with fits and starts, with stagnation and breakthroughs. One in which we’ll be able to keep up, have a say, and contribute.
When we get closer to manufacturing universal intelligence at scale it will look more like countries and corporations than omnipotent gods. The problems we’ll face will have more to do with consciousness and human rights than with aligning all-powerful forces to our needs.
Alignment is really more about automation at the incomprehensible scale, where the clash between dimensionality reduction and Goodhart’s Law of skewed incentives becomes absurd. Its intellectual heritage should come from Norbert Weiner, Marshall McLuhan, and Neil Postman — writers and thinkers who consider the complete opposite of all-knowing almighty systems — who point out how blind these systems are to the individual when operating at the aggregate, how much technology is the one controlling us.
Alignment is about wielding an automation so grand it cannot be controlled. It’s about Ashby’s Law and the fundamental conundrum that “any system simple enough to be understandable will not be complicated enough to behave intelligently, while any system complicated enough to behave intelligently will be too complicated to understand”.
Universal intelligence is what cannot be automated and what should not be controlled. How do you align a human? How do you align a group, a corporation, a country? And to what, exactly?
Whenever an optimization process declares its intentions of pursuing some goal, an adversary can conceive of degenerate cases to halt it in its tracks. This is true no matter the approach taken by the optimization process, as any assumption it makes can be subverted to veer it off course. For the search to be directed, it must be based on assumptions to drive it. It is a pointless exercise to search when you already have the answers, so the assumptions used must be an imperfect reflection of the answers. This separation — between the premises used and the goal’s complete description — provides the adversary all the ingredients it needs to craft a worst-case scenario.
Once an adversarial scenario is produced, the only choice an optimization process has is to repetitively bang its head on the wall, unaware of its blindspots. The only way to restart the process is for someone to actively go in and adapt its assumptions to circumvent the obstacle.
In contrast, a learning strategy that constantly undermines, overturns, and negates its own prior beliefs and premises can avoid this fate. An adversary cannot trap it forever using a single unrelenting task. It must come up with more and more things to straddle it with, again and again. No single specific challenge can end its hope. It can always keep chugging away at anything thrown at it. There is no guarantee that such learning strategy will always prevail, but — as long as the rules of the game allow for self-reproduction and self-representation in a universal way — it will always persevere.
Random search has no bias. It tries out everything, with no preference to where it lands. Evolution, done by random mutations on uncompressed program representations, does have a bias. Namely, toward local changes, as the cloud of mutations results in a normal distribution around its current state.
Blind evolution, as any biased approach, can be trapped and exploited by negating the premise of its bias: what if the correct solutions are faraway, beyond a ridge impassable to incremental change? Meanwhile, random search is impervious to this attack. You cannot negate the premise of its bias because it has none — it has an equal amount of belief in everything. Conversely: it does not assume a thing. Still, its purposeless net is being cast too wide and too shallow, making it impractical and infeasible to use in most cases.
Random search avoids bias by brute-force, allowing for all counterfactuals. A local search bias is more efficient in some cases, but can be fooled when the the negation of its assumptions holds true. Is the trade-off unavoidable? Can there be something in-between?
We’re looking for some strategy in which we do not have to resort to a fixed locality or an overambitious globality, but can instead use temporary fleeting biases. Some way to have “strong opinions, loosely held”: dynamic foundations that can be rearranged when required, a higher-level representation that pushes us in certain directions and forces us to follow-thorough yet is always malleable enough to not be actively hoodwinked, that allows us to escape, that is always aware to the breakdown of its assumption and to the possibility of the negation of its premises.
Once a separation exists between the true objective and the heuristic used to navigate towards it, no matter how small, an adversary can exploit it to deceive the agent into a blind alley.
This is not a theoretical and hypothetical argument made only to show the inherent limits of directed learning in the abstract. It immediately offers us a clear recipe for the construction of a channel with which we can speak in a way inaccessible for our optimization methods — making it impossible for them to follow us — like parents using a non-native language.
A synthetic reward is a heuristic approximation of the true goal and as such can be exploited by an adversary to misdirect any process aiming at following it away from its true objective. A natural reward, where the models receive feedback directly from the world itself, and not by proxy, cannot be manipulated in this way, as an adversary cannot change the rules of the universe.
Unfortunately, simply following outcomes, even natural ones, means learning can only be done in retrospect and after the fact, an imitation of what has already happened. There are many stories we can tell about the future direction of the world; it is unlikely that those who only anticipate what has already been seen will always be the right ones. Our models must take a stand, they must form an opinion, they must assume things never before realized, to have any chance of seeing true where it is that we are going.
There is merit, in some cases, to learning from scratch and disposing of the old. The learner, starting from a blank slate, sees a very idiosyncratic pattern of phenomena and a history of ideas. Consequently, the theories, explanations, and representations they may come up with will be uniquely their own, offering them a chance to form their own independent view of the world that might shed lights on areas never before traveled, offer an alternative view of the common knowledge, or make accurate predictions of the yet-unseen.
In contrast, a learned model cannot avoid closing the doors on many opportunities for discovery. It is by the same process: understanding the world through a lens predisposes to missing anything outside its scope. Only by discarding preconceptions and backtracking from existing efforts, by putting hard-earned methods in temporary limbo and unlearning rules of thumb can learned models open themselves to further discovery.
During the twentieth century one could scarcely avoid hitting limiting results in almost any field looked at, with metamathematics and computing leading the charge. But these limits were not only sobering, just as a border intent on prohibiting cannot resist defining what’s indeed allowed. When it came to the limits of knowledge these results arrived hand-in-hand with universal systems that could in principle express anything imaginable. Yet each had its own Achilles’ heel, one that felt almost intentionally and frustratingly handpicked.
Many of our known unknowns are reflections on the unknown unknowns: they give us systems that allow us to universally express everything we can think about, yet all are directly pointing to the existence of things beyond their reach.
Current Machine Learning best practice assumes that all local minima are roughly equivalent in their quality. In fact, seeing results vary wildly based on initial seed values is used as a code smell, hinting at malfunction, and, in other people’s code, whiffs of dredged p-hacking. This is backed empirically, but the explanation is sobering: it seems that model capacity and data quantity trump everything else, so why even bother at finding a global minimum.
Further, once we consider model size as part of our criterion, it’s clear from the pigeonhole principle and the definition of Kolmogorov complexity that there can only be a small number of minimal minima programs that truly capture a phenomenon. On the same grounds we can observe that the more idiosyncratic a situation, the fewer minimal minima we can find for describing it, and the more generalized, the more it can be compressed, expressed, and explained in different ways.
Things are different in problem spaces with shifting landscapes. Some minima having an equivalent current performance are more robust to change than others, or allow for quick escape hatches in the case of a calamity.
Generalization measured as performance on unseen observations is only a weak proxy to a far more vivid behavior: The adaptability to other unforeseen fitness functions, ones only indirectly similar to already encountered environments.
Consider variations to the concept of micromanagement, a much higher-level, condensed, tangled, and evocative notion than the primitive bit patterns we usually think through to see where mutations might take us.
It’s easy to start by playing with the measurement unit hinted at — macro-management is straightforward to reach. How about king-size-management, half-management, infinitesimal-management, imaginary-management? Or, moving to related areas, what do you think of micro-governance, micro-leadership, or micro-care?
Above the level of wordplay, there’s the emotion itself, the exasperation of having every inconsequential choice and insignificant move checked. Do chess players feel micromanaged by computer analysis? Is juggling micromanagement? Knitting? Can you micromanage yourself?