Does AI Know What an Apple Is? She Aims to Find Out.
Source:https://www.quantamagazine.org/does-ai-know-what-an-apple-is-she-aims-to-find-out-20240425/#comments Does AI Know What an Apple Is? She Aims to Find Out. 2024-04-29 21:58:09

What does “understanding” or “meaning” mean, empirically? What, specifically, do you look for?

When I was starting my research program at Brown, we decided that meaning involves concepts in some way. I realize this is a theoretical commitment that not everyone makes, but it seems intuitive. If you use the word “apple” to mean apple, you need the concept of an apple. That has to be a thing, whether or not you use the word to refer to it. That’s what it means to “have meaning”: there needs to be the concept, something you’re verbalizing.

I want to find concepts in the model. I want something that I can grab within the neural network, evidence that there is a thing that represents “apple” internally, that allows it to be consistently referred to by the same word. Because there does seem to be this internal structure that’s not random and arbitrary. You can find these little nuggets of well-defined function that reliably do something.

I’ve been focusing on characterizing this internal structure. What form does it have? It can be some subset of the weights within the neural network, or some kind of linear algebraic operation over those weights, some kind of geometric abstraction. But it has to play a causal role [in the model’s behavior]: It’s connected to these inputs but not those, and these outputs and not those.

That feels like something you could start to call “meaning.” It’s about figuring out how to find this structure and establish relationships, so that once we get it all in place, then we can apply it to questions like “Does it know what ‘apple’ means?”

Have you found any examples of this structure?

Yes, one result involves when a language model retrieves a piece of information. If you ask the model, “What is the capital of France,” it needs to say “Paris,” and “What is the capital of Poland” should return “Warsaw.” It very readily could just memorize all these answers, and they could be scattered all around [within the model] — there’s no real reason it needs to have a connection between those things.

Instead, we found a small place in the model where it basically boils that connection down into one little vector. If you add it to “What is the capital of France,” it will retrieve “Paris”; and that same vector, if you ask “What is the capital of Poland,” will retrieve “Warsaw.” It’s like this systematic “retrieve-capital-city” vector.

That’s a really exciting finding because it seems like [the model is] boiling down these little concepts and then applying general algorithms over them. And even though we’re looking at these really [simple] questions, it’s about finding evidence of these raw ingredients that the model is using. In this case, it would be easier to get away with memorizing — in many ways, that’s what these networks are designed to do. Instead, it breaks [information] down into pieces and “reasons” about it. And we hope that as we come up with better experimental designs, we might find something similar for more complicated kinds of concepts.

Uncategorized Source:https://www.quantamagazine.org/does-ai-know-what-an-apple-is-she-aims-to-find-out-20240425/#comments

Leave a Reply

Your email address will not be published. Required fields are marked *