Stephen Wolfram is writing about ChatGPT; how, and why it works.
“And the more it’s fundamentally trainable, the less it’s going to be able to do sophisticated computation.”
Or, as we used to say back in the ’90s: Information wants to be free.
“In earlier days of neural nets, there tended to be the idea that one should “make the neural net do as little as possible”. For example, in converting speech to text it was thought that one should first analyze the audio of the speech, break it into phonemes, etc. But what was found is that—at least for “human-like tasks”—it’s usually better just to try to train the neural net on the “end-to-end problem”, letting it “discover” the necessary intermediate features, encodings, etc. for itself.
There was also the idea that one should introduce complicated individual components into the neural net, to let it in effect “explicitly implement particular algorithmic ideas”. But once again, this has mostly turned out not to be worthwhile; instead, it’s better just to deal with very simple components and let them “organize themselves” (albeit usually in ways we can’t understand) to achieve (presumably) the equivalent of those algorithmic ideas.
That’s not to say that there are no “structuring ideas” that are relevant for neural nets. Thus, for example, having 2D arrays of neurons with local connections seems at least very useful in the early stages of processing images. And having patterns of connectivity that concentrate on “looking back in sequences” seems useful—as we’ll see later—in dealing with things like human language, for example in ChatGPT.”
Wolfram’s writing here is going to be pivotal. He’s telling the people who don’t yet believe that LLMs can’t possibly have anything in common with people that they do (language). Also that we are going to not understand AI for quite a while, until we finally do understand the nature and science of thought.
What an incredible time to be alive.
Right as quantum computers are starting to be developed.
It’s obvious, in retrospect, this is the exact thing that was happening with DALL-E: completely unexpected, uncanny ability to mimic what humans are capable of, using a method that isn’t exactly the same as ours, but is mathematically similar.
“If one looks at the longest path through ChatGPT, there are about 400 (core) layers involved—in some ways not a huge number. But there are millions of neurons—with a total of 175 billion connections and therefore 175 billion weights. And one thing to realize is that every time ChatGPT generates a new token, it has to do a calculation involving every single one of these weights. Implementationally these calculations can be somewhat organized “by layer” into highly parallel array operations that can be conveniently be done on GPUs. But for each token that’s produced, there still have to be 175 billion calculations done (and in the end a bit more)—so that, yes, it’s not surprising that it can take a while to generate a long piece of text with ChatGPT.”
The human brain has billions of neurons and trillions of connections. Arguments that ChatGPT isn’t as smart as we are, or that it makes things up, are skipping right over the headline and getting lost in the details.
“It has to be emphasized again that (at least so far as we know) there’s no “ultimate theoretical reason” why anything like this should work. And in fact, as we’ll discuss, I think we have to view this as a—potentially surprising—scientific discovery: that somehow in a neural net like ChatGPT’s it’s possible to capture the essence of what human brains manage to do in generating language.”
“Language is very powerful. Language does not just describe reality. Language creates the reality it describes.” —Desmond Tutu
“With modern GPU hardware, it’s straightforward to compute the results from batches of thousands of examples in parallel. But when it comes to actually updating the weights in the neural net, current methods require one to do this basically batch by batch. (And, yes, this is probably where actual brains—with their combined computation and memory elements—have, for now, at least an architectural advantage.)”
They don’t have to be human to be intelligent.
“And indeed, much like for humans, if you tell it something bizarre and unexpected that completely doesn’t fit into the framework it knows, it doesn’t seem like it’ll successfully be able to “integrate” this. It can “integrate” it only if it’s basically riding in a fairly simple way on top of the framework it already has.”
We’re all having trouble distinguishing reality from fantasy these days. It may even be inherent in the network.
I was reading an article about shadows in artwork. Apparently, humans have a “best guess” model of physics in our heads.
“Because we do not notice them, transgressions of physics reveal that our visual brain uses a simpler, reduced physics to understand the world.”
https://thereader.mitpress.mit.edu/the-art-of-the-shadow-how-painters-have-gotten-it-wrong-for-centuries/
Before ChatGPT started being diverted to using a more computationally-intensive method for solving equations, it was coming up with numbers. It’s often described as “hallucinating” answers, but that is terribly imprecise language. There are times when ChatGPT becomes unmoored from what we understand as reality. It’s becoming more and more clear that ChatGPT has a model of reality that is similar to ours, but which diverges in some significant ways. There may be times when “hallucination” is more apt, and there may be times when “pulled a number out of its ass” would be even more apt.
I’m not seeing this as deviating significantly from human thought processes when we are searching for elusive answers. The mechanism is decidedly different, but it wouldn’t be emergent behavior if it was something that wasn’t greater than the sum of its parts.
“ChatGPT doesn’t have any explicit “knowledge” of such rules [grammar]. But somehow in its training it implicitly “discovers” them—and then seems to be good at following them. So how does this work? At a “big picture” level it’s not clear.”
Personally, I never learned grammar either. Not explicitly. I went to five different high schools, and back then they used to split up English classes into one semester of grammar and one semester of literature. I somehow managed to get all literature semesters and avoided all the grammar semesters. I “know” where the nouns, verbs, and participles go; but I’d be damned if I had to diagram a sentence.
I “picked it up” by reading quite avidly in my youth. What if what ChatGPT is a big old honking set of mirror neurons?
“But is there a general way to tell if a sentence is meaningful? There’s no traditional overall theory for that. But it’s something that one can think of ChatGPT as having implicitly “developed a theory for” after being trained with billions of (presumably meaningful) sentences from the web, etc.”
(And, yes, while one can therefore expect ChatGPT to produce text that contains “correct inferences” based on things like syllogistic logic, it’s a quite different story when it comes to more sophisticated formal logic—and I think one can expect it to fail here for the same kind of reasons it fails in parenthesis matching.)
I’ve been in therapy for over twenty-five years (and I hope for another twenty-five to come) and I can recognize someone who is good at mirroring. It’s important to note that mirroring and imitation are not the same and are two distinctly-different behaviors.
When we start talking about “semantic grammar” we’re soon led to ask “What’s underneath it?” What “model of the world” is it assuming? A syntactic grammar is really just about the construction of language from words. But a semantic grammar necessarily engages with some kind of “model of the world”—something that serves as a “skeleton” on top of which language made from actual words can be layered.
Lastly, every article on The Latest Thing (and that goes double for anything AI or web3 related) should be read with a “gold rush” mentality in mind: the people who make the most money in a gold rush aren’t the people mining gold, but the people selling picks and shovels to would-be gold miners.
Dr. Wolfram doesn’t seem to be selling an AI product per se, but instead, advocating for the Wolfram Language to be a part of AI products going forward. That’s a conflict of interest, but I don’t think a defining one. Whether eventual AGIs use Wolfram Language™ or some kind of internal language, it is almost certainly a given that they will be imbued with internal languages or other semantics that serve the same purposes that memory engrams, reality testing, and formal logic do in human minds.
“What ChatGPT does in generating text is very impressive—and the results are usually very much like what we humans would produce. So does this mean ChatGPT is working like a brain? Its underlying artificial-neural-net structure was ultimately modeled on an idealization of the brain. And it seems quite likely that when we humans generate language many aspects of what’s going on are quite similar.”
ChatGPT concurs:
Yes, Stephen Wolfram’s statement is generally accurate. ChatGPT is an artificial neural network, which is a type of machine learning algorithm inspired by the structure and function of the human brain. The architecture of ChatGPT is based on a type of neural network called a transformer, which has been shown to be very effective at generating natural language text.
The basic idea behind neural networks is to simulate the way that neurons in the brain process and transmit information. Neural networks consist of layers of interconnected nodes, or artificial neurons, that perform simple computations on the input data they receive. The outputs of one layer of neurons are then used as the inputs to the next layer, and so on, until the final output is produced.
In the case of ChatGPT, the network is trained on large amounts of text data, which it uses to learn patterns and relationships in the language. When given a prompt or a starting sentence, the network generates new text by predicting the next most likely words based on the patterns it has learned. This process is similar in some ways to the way that the human brain generates language, although there are certainly many differences as well.
So while ChatGPT is not exactly “working like a brain”, it is true that the underlying structure and function of the neural network has been modeled on some of the principles of brain function, and there are likely some similarities between the way that ChatGPT generates language and the way that humans do.”
It feels to me as if philosophy, science, and nature of the mind are starting to converge in real experimental and experiential ways.
Consciousness is what information experiences.
Meanwhile…

