AI Generated art of Prometheus stealing fire but it is a robot stealing a human's thoughts. Greek statuary.

Measurements can only be right when we measure the right thing

It’s my understanding that OpenAI’s goal at this time isn’t necessarily to make an AI that is able to spontaneously regurgitate every fact on every subject, but instead to create an AI that has sufficient reasoning ability to be able to work out truth and falsehood based on known and verified sources of information it has available to it.

The goal is to dispense knowledge, not trivia. But trivia is something the rest of us out here can witness in an accessible way. Few of us spend much time creating thought problems to benchmark GPT-4’s reasoning ability and wouldn’t be able to assess the results very well if we did. So, instead, we ask it, “What is the capitol of Nebraska?” and if it replies, “Bismarck”, we can feel superior; or, if it replies, “Lincoln”, we can fret over the impending loss of millions of customer service jobs.

The magic lies elsewhere.

“I think the right way to think of the models we create is as a reasoning engine, not a fact database. They can also act as a fact database, but that’s not really what is special about them.” —Sam Altman

Rebecca Jarvis interviews Sam Altman for ABC News Rebecca Jarvis,

As in humans, those extraneous facts that float around in our heads are good for snap decisions where it’s do-or-die time, but when it is time for serious problem solving, we reason it out and double-check against the facts. ChatGPT is just winging it, and it shows. That should be completely unsurprising.

But in the case of a super-capable AGI, it just has to be smart enough to phase out any internal Dunning-Krueger effects, take the 0.0001ms to gather the data, analyze it for solutions, generate solutions and knock-on effects, and present them as the solution.

The data stored in the language isn’t what we want to extract. If this technology pans out, it will be that we distilled the essence of thought from language we’ve used for 50,000-odd years to express ourselves. Turns out, we did a really good job. In the Prometheus story, it is flame of fire which Prometheus steals from the gods; in our story, we are reaching inward to find our own fire.

It is still too early to tell what exactly we’ve discovered. Maybe LLMs will pan out to be a useful tool if one can figure out how to work around the rough edges. For right now, though, I would encourage folks not to judge these models by how well they remember trivia. Even asking Einstein to work out math problems in his head vs. giving him a pencil and paper to work out the problem: one of those answers is going to be more consistently reliable.

To the extent the GPT-4 is able to conjure up facts without access to an external data source is a pretty incredible feat of engineering. Still, no one should be making tools that rely on the built-in knowledge of an LLM.

Instead, ask, “does it do useful work, independent of facts?”

The answer seems to be an unqualified yes, so far. Smarter folk than you and I will have to do research and innovate. This is an exciting time, full of potential.

Let’s engage with it so that we can make the best use of these tools and prevent them from being used against us by other humans; for we may or may not have to worry about machines taking advantage of us, but we know for certain that humans do. Let’s not give in to cynicism and miss out on an opportunity to build a tool that could very realistically lead to eliminating a lot of suffering.


Image: AI Generated Image by Bing Image “Prometheus stealing fire but it is a robot stealing a human’s thoughts. Greek statuary.”