2023: The Year Everything Changed

This is just the beginning.

Haomiao Huang’s article in Ars brings us up to speed on the decade of machine-learning advancements that got us to this point.


The latest image models like Stable Diffusion use a process called latent diffusion. Instead of directly generating the latent representation, a text prompt is used to incrementally modify initial images. The idea is simple: If you take an image and add noise to it, it will eventually become a noisy blur. However, if you start with a noisy blur, you can “subtract” noise from it to get an image back. You must “denoise” smartly—that is, in a way that moves you closer to a desired image.

In this case, instead of a transformer generating pictures, you have a transformer model that takes latent encodings of an image and a text string and modifies the image so it better matches the text. After running a few dozen iterations, you can go from a noisy blur to a sharp AI-generated picture.

But you don’t have to start with a noisy blur. You can start with another picture, and the transformer will simply adjust from this image toward something it thinks better matches the text prompt. This is how you can have an AI model that takes rough, basic sketches and turns them into photorealistic images.

My new analogy is a sort of reverse engineering and then using the information to create something new. Probably not the exact same mechanism that human or dog neurons use, but one that leads to a similar outcome. Surely in living creatures the neural network contains more complexity and more sophisticated mechanisms that may include these transformers or tokens. Human minds use “engrams”. What the difference between a “token” and an “engram” is going to be an interesting field of inquiry. I expect engrams are more complicated than tokens. Maybe “memories” are sophisticated “tokens”?

“The unconscious is structured like a language” —Jacques Lacan

What an incredible time to be alive.

“Another consideration is that these AI models are fundamentally stochastic. They’re trained using a technique called gradient descent. The training algorithm compares the training data to the output of the AI model and calculates a “direction” to move closer to the right answer. There’s no explicit concept of a right or wrong answer—just how close it is to being correct.”

I am not certain I understand the concern here. This simply describes trial and error and the scientific method: it is how a being should ideally come to draw conclusions. I’m interested in seeing how these models evolve over time to incorporate re-examining previous conclusions. We have great models of belief. I wonder how closely they’ll correlate with any emergent behavior of neural nets, or if there are models and matrices yet to be developed that will provide these emerging intelligences with a more sapient experience.

“Predictions are hard. Perhaps the only thing we can say is that these AI tools will continue to get more powerful, easier to use, and cheaper. We’re in the early stages of a revolution that could be as profound as Moore’s Law, and I’m both excited and terrified about what’s yet to come.”


Still, I believe that whatever dangers AI will bring, our greatest dangers will always be posed by each other. Long before any AGI becomes autonomous, a few humans will try to use what power it does have to rule over the rest of us. We are already so very close to that without the help of AGI.

Welcome to the party, Hal.

P.S. This, hot on the heels of the article talking about how bad at math ChatGPT is:

from the OpenAI Discord channel https://discord.gg/openai