Jacob Browning, Author at NOEMA https://www.noemamag.com Noema Magazine Thu, 23 Mar 2023 13:47:53 +0000 en-US 15 hourly 1 https://wordpress.org/?v=6.8.3 https://www.noemamag.com/wp-content/uploads/2020/06/cropped-ms-icon-310x310-1-32x32.png Jacob Browning, Author at NOEMA https://www.noemamag.com/author/jacobbrowning/ 32 32 AI Chatbots Don’t Care About Your Social Norms https://www.noemamag.com/ai-chatbots-dont-care-about-your-social-norms Tue, 07 Mar 2023 18:14:25 +0000 https://www.noemamag.com/ai-chatbots-dont-care-about-your-social-norms The post AI Chatbots Don’t Care About Your Social Norms appeared first on NOEMA.

]]>
With artificial intelligence now powering Microsoft’s Bing and Google’s Bard search engines, brilliant and clever conversational AI is at our fingertips. But there have been many uncanny moments — including casually delivered disturbing comments like calling a reporter ugly, declaring love for strangers or rattling off plans for taking over the world. 

To make sense of these bizarre moments, it’s helpful to start by thinking about the phenomenon of saying the wrong thing. Humans are usually very good at avoiding spoken mistakes, gaffes and faux pas. Chatbots, by contrast, screw up a lot. Understanding why humans excel at this clarifies when and why we trust each other — and why current chatbots can’t be trusted. 

Getting It Wrong

For GPT-3, there is only one way to say the wrong thing: By making a statistically unlikely response to whatever the last few words were. Its understanding of context, situation and appropriateness concerns only what can be derived from the user’s prompt. For ChatGPT, this is modified slightly in a novel and interesting way. In addition to saying something statistically likely, the model’s responses are also reinforced by human evaluators: The system outputs a response, and human evaluators either reinforce it as a good one or not (a grueling, traumatizing process for the evaluators). The upshot is a system that is not just saying something plausible, but also (ideally) something a human would judge to be appropriate — if not the right thing, at least not offensive.

“Chatbots don’t recognize there are things they shouldn’t say.”

But this approach makes visible a central challenge facing any speaker — mechanical or otherwise. In human conversation, there are countless ways to say the wrong thing: We can say something inappropriate, dishonest, confusing, irrelevant, offensive or just plain stupid. We can even say the right thing but be faulted for saying it with the wrong tone or emphasis. Our whole lives are spent navigating innumerable conversational landmines in our dealings with other people. Not saying the wrong thing isn’t just an important part of a conversation; it is often more important than the conversation itself. Sometimes, keeping our mouths shut may be the only right course of action. 

Given how few ways there are to say the right thing, and how many different ways there are to say something wrong, it is shocking that humans don’t make more mistakes than they do. How do we navigate this perilous landscape of not saying the wrong thing, and why aren’t chatbots navigating it as effectively?

How Conversations Should Work

While human conversations can be about anything, our lives are mostly scripted: ordering at a restaurant, making small talk, apologizing for running late and so on. These aren’t literal scripts — there is definite improvisation — so they are more general patterns or loose rules that stipulate how certain kinds of interaction should go. This puts improvisation within narrow bounds: No matter what you decide to order at the restaurant, there’s a right way to do it, and it can be shocking if someone doesn’t get the script.

Scripts are not primarily governed by words. The same script can work even if you don’t speak the language, as tourists worldwide prove by gesturing and pointing. Social norms govern these scripts — shared social institutions, practices and expectations that help us navigate life. These norms specify how everyone should behave in certain scenarios, assigning roles to everyone and giving broad guidance for how to act. An impassive and bored clerk conforms to the same script as the irate person yammering at them, as do the frustrated people in line. 

This works because humans are natural conformists. Norm-following is useful: It simplifies our interactions by standardizing and streamlining them, making us all much more predictable to ourselves and each other. 

We’ve come up with conventions and norms to govern almost every aspect of our social lives, from what fork to use to how long you should wait before honking at a light. This is essential for surviving in a world of billions, where most people we encounter are complete strangers with beliefs we may disagree with. Putting these shared norms in place makes conversation not just possible but fruitful, laying out what we should talk about — and all the things we shouldn’t.

The Other Side Of Norms

But humans are not just conforming to norms; they are bound by them. Norms are distinct from mere conventions because humans are inclined to sanction those who violate a norm — sometimes overtly, other times just avoiding them. Social norms make it easy to evaluate strangers and determine whether they are trustworthy or not — on a first date, people scan how another person acts, what words they use, which questions they ask. If the other person violates any norms — if they act boorish or inappropriate, for example — we often judge them and deny them a second date. 

For humans, these judgments aren’t just a matter of dispassionate evaluations. They are further grounded in our emotional responses to the world. Part of our education as children is a thorough emotional training, ensuring we feel the proper emotions at the right times in conversations: anger when someone violates norms of decency, disgust when someone says something offensive and shame when we’re caught in a lie. Our moral conscience allows us to respond rapidly in conversations to anything inappropriate, as well as predict how others will react to our remarks — when a norm violation will land as a brilliant joke or a career-ending blunder. 

The same emotions, though, also push us to enormous lengths to punish violators. If someone has done something egregiously wrong, we often feel compelled to gossip about them. Part of this is simple resentment: If someone does wrong, we might feel like they deserve public condemnation. 

But it is more than that. Someone who violates even a simple norm has their whole character called into question. If they’d lie about one thing, what wouldn’t they lie about? Making it public is meant to cause shame and, in the process, force the other person to apologize for (or at least defend) their actions. It also strengthens the norm — the MeToo movement convinced victims of sexual violence to speak up because people would finally take their claims seriously.  

In short, people are expected to follow the norms closely, or else. There are high stakes to speaking because we can be held accountable for anything we say, so even self-interested jerks will tend to stay in line to avoid a shaming. So we choose our words carefully, and we expect the same of those we surround ourselves with.

Unbound Chatbots

The high stakes of human conversation sheds light on what makes chatbots so unnerving. By merely predicting how a conversation will go, chatbots end up loosely conforming to our norms, but they are not bound by them. When we engage them in casual conversation or test their ability to solve linguistic puzzles, they usually come up with plausible-sounding answers and behave in a normal, human-like way. Someone might even be fooled into thinking they are a person. 

But if we change the prompt slightly and or adopt a different script, they will suddenly spew conspiracy theories, go on racist tirades or bullshit us. These things are not statistically implausible; there are plenty of conspiracy nuts and trolls out there and chatbots are trained on what they’ve written on Reddit and elsewhere too. 

Any of us could say the same words as these trolls. But we shouldn’t say them because they are nonsense, offensive, cruel and dishonest, and most of us don’t say them because we don’t believe them and we might be run out of town if we did. The norms of decency have pushed offensive behavior to the margins of society (or, at least, what used to be the margins), so most of us wouldn’t dare say such things.

“By merely predicting how a conversation will go, chatbots end up loosely conforming to our norms, but they are not bound by them.”

Chatbots, by contrast, don’t recognize there are things they shouldn’t say regardless of how statistically likely they are. They don’t recognize social norms that define the territory between what a person should and shouldn’t say — they’re oblivious to the underlying social pressures that shape how we use language. Even when a chatbot acknowledges a screw-up and apologizes, it doesn’t understand why; it might even apologize for getting an answer right if we tell it it’s wrong. 

This illuminates the deeper issue: We expect human speakers to be committed to what they say and we hold them accountable for it. We don’t need to examine their brain or know any psychology to do this — we just trust them if they have a history of being reliable, following norms and acting respectfully. 

The problem with chatbots isn’t that they are black boxes or that the technology is unfamiliar. It’s that they have a long history of being unreliable and offensive, yet they make no effort to improve on it — or even realize there is a problem.

Programmers, of course, are aware of these problems. They (and the companies hoping their AI technologies will be widely used) are concerned about the reputations of their chatbots and expend enormous amounts of time retooling their systems to avoid difficult conversations or iron out improper responses. While this helps make them safer, programmers will struggle to stay ahead of the people trying to break the system. The programmer’s approach is reactive and will always be behind the curve: There are just too many ways of being wrong to predict them all. 

Smart, But Not Human

This shouldn’t lead us to smug self-righteousness about how smart humans are and how dumb chatbots are. On the contrary, their capacity to talk about anything reveals an impressive — if superficial — knowledge of human social life and the world. They are plenty smart — or, at least, capable of doing well on tests or referencing useful information. The panic these tools have raised among educators is evidence enough of their impressive book learning. 

The problem is that they don’t care. They don’t have any intrinsic goals they want to accomplish through conversation and aren’t motivated by what others think or how they are reacting. They don’t feel bad about lying and they gain nothing by being honest. They are shameless in a way even the worst people aren’t — even Donald Trump cares enough about his reputation to at least claim he’s truthful. 

This makes their conversations pointless. For humans, conversations are a means to getting things we want — to form a connection, get help on a project, pass the time or learn about something. Conversations require we take some interest in the people we talk to — and ideally, to care about them. 

“Chatbots don’t have any intrinsic goals they want to accomplish through conversation and aren’t motivated by what others think or how they are reacting.”

Even if we don’t care about them, we at least care about what they think of us. We’re deeply cognizant that our success in life — our ability to have loving relationships, do good work and play in the local shuffleboard league — depends on having a good reputation. If our social standing drops, we can lose everything. Conversations shape who people think we are. And many of us use internal monologues to shape who we think we are.

But chatbots don’t have a story to tell about themselves or a reputation to defend. They don’t feel the pull of acting responsibly like the rest of us. They can and are useful in many highly scripted situations with lots of leeway, from playing Dungeon Master, writing plausible copy or helping an author explore ideas. But they lack a grasp of themselves or other people needed to be trustworthy social agents — the kind of person we expect we’re talking to most of the time. Without some grasp of the norms governing honesty and decency and some concern about their reputation, there are limits to how useful these systems can be — and real dangers to relying on them.

Uncanny Conversation

The upshot is that chatbots aren’t conversing in a human way, and they’ll never get there solely by saying statistically likely things. Without a genuine understanding of the social world, these systems are just idle chatterboxes — no matter how witty or eloquent. 

This is helpful for framing why these systems are such interesting tools — but also why we shouldn’t anthropomorphize them. Humans aren’t just dispassionate thinkers or speakers; we’re intrinsically normative creatures, emotionally bound to one another by shared, enforced expectations. Human thought and speech result from our sociality, not vice versa. 

Mere talk, divorced from broader engagement in the world, has little in common with humans. Chatbots aren’t using language like we are — even when they say exactly the same things we do. Ultimately, we’re talking past each other. They don’t get why we talk the way we do, and it shows.

The post AI Chatbots Don’t Care About Your Social Norms appeared first on NOEMA.

]]>
]]>
AI And The Limits Of Language https://www.noemamag.com/ai-and-the-limits-of-language Tue, 23 Aug 2022 14:17:57 +0000 https://www.noemamag.com/ai-and-the-limits-of-language The post AI And The Limits Of Language appeared first on NOEMA.

]]>
Credits

Jacob Browning is a postdoc in NYU’s Computer Science Department working on the philosophy of AI.

Yann LeCun is a Turing Award-winning machine learning researcher, an NYU professor and the chief AI scientist at Meta.

When a Google engineer recently declared Google’s AI chatbot a person, pandemonium ensued. The chatbot, LaMDA, is a large language model (LLM) that is designed to predict the likely next words to whatever lines of text it is given. Since many conversations are somewhat predictable, these systems can infer how to keep a conversation going productively. LaMDA did this so impressively that the engineer, Blake Lemoine, began to wonder about whether there was a ghost in the machine.

Reactions to Lemoine’s story spanned the gamut: some people scoffed at the mere idea that a machine could ever be a person. Others suggested that this LLM isn’t a person, but the next perhaps might be. Still others pointed out that deceiving humans isn’t very challenging; we see saints in toast, after all.

But the diversity of responses highlights a deeper problem: as these LLMs become more common and powerful, there seems to be less and less agreement over how we should understand them. These systems have bested many “common sense” linguistic reasoning benchmarks over the years, many which promised to be conquerable only by a machine that “is thinking in the full-bodied sense we usually reserve for people.” Yet these systems rarely seem to have the common sense promised when they defeat the test and are usually still prone to blatant nonsense, non sequiturs and dangerous advice. This leads to a troubling question: how can these systems be so smart, yet also seem so limited?

The underlying problem isn’t the AI. The problem is the limited nature of language. Once we abandon old assumptions about the connection between thought and language, it is clear that these systems are doomed to a shallow understanding that will never approximate the full-bodied thinking we see in humans. In short, despite being among the most impressive AI systems on the planet, these AI systems will never be much like us.

Saying It All

A dominant theme for much of the 19th and 20th century in philosophy and science was that knowledge just is linguistic — that knowing something simply means thinking the right sentence and grasping how it connects to other sentences in a big web of all the true claims we know. The ideal form of language, by this logic, would be a purely formal, logical-mathematical one composed of arbitrary symbols connected by strict rules of inference, but natural language could serve as well if you took the extra effort to clear up ambiguities and imprecisions. As Wittgenstein put it, “The totality of true propositions is the whole of natural science.” This position was so established in the 20th century that psychological findings of cognitive maps and mental images were controversial, with many arguing that, despite appearances, these must be linguistic at base.

This view is still assumed by some overeducated, intellectual types: everything which can be known can be contained in an encyclopedia, so just reading everything might give us a comprehensive knowledge of everything. It also motivated a lot of the early work in Symbolic AI, where symbol manipulation — arbitrary symbols being bound together in different ways according to logical rules — was the default paradigm. For these researchers, an AI’s knowledge consisted of a massive database of true sentences logically connected with one another by hand, and an AI system counted as intelligent if it spit out the right sentence at the right time — that is, if it manipulated symbols in the appropriate way. This notion is what underlies the Turing test: if a machine says everything it’s supposed to say, that means it knows what it’s talking about, since knowing the right sentences and when to deploy them exhausts knowledge.


Deep Learning Alone Isn’t Getting Us To Human-Like AI

What AI Can Tell Us About Intelligence

The Model Is The Message


But this was subject to a withering critique which has dogged it ever since: just because a machine can talk about anything, that doesn’t mean it understands what it is talking about. This is because language doesn’t exhaust knowledge; on the contrary, it is only a highly specific, and deeply limited, kind of knowledge representation. All language — whether a programming language, a symbolic logic or a spoken language — turns on a specific type of representational schema; it excels at expressing discrete objects and properties and the relationships between them at an extremely high level of abstraction. But there is a massive difference between reading a musical score and listening to a recording of the music, and a further difference from having the skill to play it.

All representational schemas involve a compression of information about something, but what gets left in and left out in the compression varies. The representational schema of language struggles with more concrete information, such as describing irregular shapes, the motion of objects, the functioning of a complex mechanism or the nuanced brushwork of a painting — much less the finicky, context-specific movements needed for surfing a wave. But there are nonlinguistic representational schemes which can express this information in an accessible way: iconic knowledge, which involves things like images, recordings, graphs and maps; and the distributed knowledge found in trained neural networks — what we often call know-how and muscle memory. Each scheme expresses some information easily even while finding other information hard — or even impossible — to represent: what does “Either Picasso or Twombly” look like?

The Limits Of Language

One way of grasping what is distinctive about the linguistic representational schema — and how it is limited — is recognizing how littleinformation it passes along on its own. Language is a very low-bandwidth method for transmitting information: isolated words or sentences, shorn of context, convey little. Moreover, because of the sheer number of homonyms and pronouns, many sentences are deeply ambiguous: does “the box was in the pen” refer to an ink pen or a playpen? As Chomsky and his acolytes have pointed out for decades, language is just not a clear and unambiguous vehicle for clear communication.

But humans don’t need a perfect vehicle for communication because we share a nonlinguistic understanding. Our understanding of a sentence often depends on our deeper understanding of the contexts in which this kind of sentence shows up, allowing us to infer what it is trying to say. This is obvious in conversation, since we are often talking about something directly in front of us, such as a football game, or communicating about some clear objective given the social roles at play in a situation, such as ordering food from a waiter. But the same holds in reading passages — a lesson which not only undermines common-sense language tests in AI but also a popular method of teaching context-free reading comprehension skills to children. This method focuses on using generalized reading comprehension strategies to understand a text — but research suggests that the amount of background knowledge a child has on the topic is actually the key factor for comprehension. Understanding a sentence or passage depends on an underlying grasp of what the topic is about.

“It is clear that these systems are doomed to a shallow understanding that will never approximate the full-bodied thinking we see in humans.”

The inherently contextual nature of words and sentences is at the heart of how LLMs work. Neural nets in general represent knowledge as know-how, the skillful ability to grasp highly context-sensitive patterns and find regularities — both concrete and abstract — necessary for handling inputs in nuanced ways that are narrowly tailored to their task. In LLMs, this involves the system discerning patterns at multiple levels in existing texts, seeing both how individual words are connected in the passage but also how the sentences all hang together within the larger passage which frames them. The result is that its grasp of language is ineliminably contextual; every word is understood not on its dictionary meaning but in terms of the role it plays in a diverse collection of sentences. Since many words — think “carburetor,” “menu,” “debugging” or “electron” — are almost exclusively used in specific fields, even an isolated sentence with one of these words carries its context on its sleeve.

In short, LLMs are trained to pick up on the background knowledge for each sentence, looking to the surrounding words and sentences to piece together what is going on. This allows them to take an infinite possibility of different sentences or phrases as input and come up with plausible (though hardly flawless) ways to continue the conversation or fill in the rest of the passage. A system trained on passages written by humans, often conversing with each other, should come up with the general understanding necessary for compelling conversation.

Shallow Understanding

While some balk at using the term “understanding” in this context or calling LLMs “intelligent,” it isn’t clear what semantic gatekeeping is buying anyone these days. But critics are right to accuse these systems of being engaged in a kind of mimicry. This is because LLMs’ understanding of language, while impressive, is shallow. This kind of shallow understanding is familiar; classrooms are filled with jargon-spouting students who don’t know what they’re talking about — effectively engaged in a mimicry of their professors or the texts they are reading. This is just part of life; we often don’t know how little we know, especially when it comes to knowledge acquired from language.

LLMs have acquired this kind of shallow understanding about everything. A system like GPT-3 is trained by masking the future words in a sentence or passage and forcing the machine to guess what word is most likely, then being corrected for bad guesses. The system eventually gets proficient at guessing the most likely words, making them an effective predictive system.

This brings with it some genuine understanding: for any question or puzzle, there are usually only a few right answers but an infinite number of wrong answers. This forces the system to learn language-specific skills, such as explaining a joke, solving a word problem or figuring out a logic puzzle, in order to regularly predict the right answer on these types of questions. These skills, and the connected knowledge, allow the machine to explain how something complicated works, simplify difficult concepts, rephrase and retell stories, along with a host of other language-dependent abilities. Instead of a massive database of sentences linked by logical rules, as Symbolic AI assumed, the knowledge is represented as context-sensitive know-how for coming up with a plausible sentence given the prior line.

“Abandoning the view that all knowledge is linguistic permits us to realize how much of our knowledge is nonlinguistic.”

But the ability to explain a concept linguistically is different from the ability to use it practically. The system can explain how to perform long division without being able to perform it or explain what words are offensive and should not be said while then blithely going on to say them. The contextual knowledge is embedded in one form — the capacity to rattle off linguistic knowledge — but is not embedded in another form — as skillful know-how for how to do things like being empathetic or handling a difficult issue sensitively.

The latter kind of know-how is essential to language users, but that doesn’t make them linguistic skills — the linguistic component is incidental, not the main thing. This applies to many concepts, even those learned from lectures and books: while science classes do have a lecture component, students are graded primarily based on their lab work. Outside the humanities especially, being able to talk about something is often less useful or important than the nitty-gritty skills needed to get things to work right.

Once we scratch beneath the surface, it is easier to see how limited these systems really are: they have the attention span and memory of roughly a paragraph. This can easily be missed if we engage in a conversation because we tend to focus on just the last comment or two and focus only on our next response.

But the know-how for more complex conversations — active listening, recall and revisiting prior comments, sticking to a topic to make a specific point while fending off distractors, and so on — all require more attention and memory than the system possesses. This reduces even further what kind of understanding is available to them: it is easy to trick them simply by being inconsistent every few minutes, changing languages or gaslighting the system. If it is too many steps back, the system will just start over, accepting your new views as consistent with older comments, switching languages with you or acknowledging it believes whatever you said. The understanding necessary for developing a coherent view of the world is far beyond their ken.  

Beyond Language

Abandoning the view that all knowledge is linguistic permits us to realize how much of our knowledge is nonlinguistic. While books contain a lot of information we can decompress and use, so do many other objects: IKEA instructions don’t even bother writing out instructions alongside its drawings; AI researchers often look at the diagrams in a paper first, grasp the network architecture and only then glance through the text; visitors can navigate NYC by following the red or green lines on a map. 

This goes beyond simple icons, graphs and maps. Humans learn a lot directly from exploring the world, which shows us how objects and people can and cannot behave. The structures of artifacts and the human environment convey a lot of information intuitively: doorknobs are at hand height, hammers have soft grips and so on. Nonlinguistic mental simulation, in animals and humans, is common and useful for planning out scenarios and can be used to craft, or reverse-engineer, artifacts. Similarly, social customs and rituals can convey all kinds of skills to the next generation through imitation, extending from preparing foods and medicines to maintaining the peace at times of tension. Much of our cultural knowledge is iconic or in the form of precise movements passed on from skilled practitioner to apprentice. These nuanced patterns of information are hard to express and convey in language but are still accessible to others. This is also the precise kind of context-sensitive information that neural networks excel at picking up and perfecting.

“A system trained on language alone will never approximate human intelligence, even if trained from now until the heat death of the universe.”

Language is important because it can convey a lot of information in a small format and, especially after the creation of the printing press and the internet, can involve reproducing and making it available widely. But compressing information in language isn’t cost-free: it takes a lot of effort to decode a dense passage. Humanities classes may require a lot of reading out of class, but a good chunk of class time is still spent going over difficult passages. Building a deep understanding is time-consuming and exhaustive, however the information is provided.

This explains why a machine trained on language can know so much and yet so little. It is acquiring a small part of human knowledge through a tiny bottleneck. But that small part of human knowledge can be about anything, whether it be love or astrophysics. It is thus a bit akin to a mirror: it gives the illusion of depth and can reflect almost anything, but it is only a centimeter thick. If we try to explore its depths, we bump our heads.

Exorcising The Ghost

This doesn’t make these machines stupid, but it also suggests there are intrinsic limits concerning how smart they can be. A system trained on language alone will never approximate human intelligence, even if trained from now until the heat death of the universe. This is just the wrong kind of knowledge for developing awareness or being a person. But they will undoubtedly seem to approximate it if we stick to the surface. And, in many cases, the surface is enough; few of us really apply the Turing test to other people, aggressively querying the depth of their understanding and forcing them to do multidigit multiplication problems. Most talk is small talk.

But we should not confuse the shallow understanding LLMs possess for the deep understanding humans acquire from watching the spectacle of the world, exploring it, experimenting in it and interacting with culture and other people. Language may be a helpful component which extends our understanding of the world, but language doesn’t exhaust intelligence, as is evident from many species, such as corvids, octopi and primates.

Rather, the deep nonlinguistic understanding is the ground that makes language useful; it’s because we possess a deep understanding of the world that we can quickly understand what other people are talking about. This broader, context-sensitive kind of learning and know-how is the more basic and ancient kind of knowledge, one which underlies the emergence of sentience in embodied critters and makes it possible to survive and flourish. It is also the more essential task that AI researchers are focusing on when searching for common sense in AI, rather than this linguistic stuff. LLMs have no stable body or abiding world to be sentient of—so their knowledge begins and ends with more words and their common-sense is always skin-deep. The goal is for AI systems to focus on the world being talked about, not the words themselves — but LLMs don’t grasp the distinction. There is no way to approximate this deep understanding solely through language; it’s just the wrong kind of thing. Dealing with LLMs at any length makes apparent just how little can be known from language alone.

The post AI And The Limits Of Language appeared first on NOEMA.

]]>
]]>
What AI Can Tell Us About Intelligence https://www.noemamag.com/what-ai-can-tell-us-about-intelligence Thu, 16 Jun 2022 15:52:29 +0000 https://www.noemamag.com/what-ai-can-tell-us-about-intelligence The post What AI Can Tell Us About Intelligence appeared first on NOEMA.

]]>
Credits

Jacob Browning is a postdoc in NYU’s Computer Science Department working on the philosophy of AI.

Yann LeCun is a Turing Award-winning machine learning researcher, an NYU professor and the chief AI scientist at Meta.

If there is one constant in the field of artificial intelligence it is exaggeration: There is always breathless hype and scornful naysaying. It is helpful to occasionally take stock of where we stand.

The dominant technique in contemporary AI is deep learning (DL) neural networks, massive self-learning algorithms which excel at discerning and utilizing patterns in data. Since their inception, critics have prematurely argued that neural networks had run into an insurmountable wall — and every time, it proved a temporary hurdle. In the 1960s, they could not solve non-linear functions. That changed in the 1980s with backpropagation, but the new wall was how difficult it was to train the systems. The 1990s saw a rise of simplifying programs and standardized architectures which made training more reliable, but the new problem was the lack of training data and computing power.

In 2012, when contemporary graphics cards could be trained on the massive ImageNet dataset, DL went mainstream, handily besting all competitors. But then critics spied a new problem: DL required too much hand-labeled data for training. The last few years have rendered this criticism moot, as self-supervised learning has resulted in incredibly impressive systems, such as GPT-3, which do not require labeled data.

Today’s seemingly insurmountable wall is symbolic reasoning, the capacity to manipulate symbols in the ways familiar from algebra or logic. As we learned as children, solving math problems involves a step-by-step manipulation of symbols according to strict rules (e.g., multiply the furthest right column, carry the extra value to the column to the left, etc.). Gary Marcus, author of “The Algebraic Mind” and co-author (with Ernie Davis) of “Rebooting AI,recently argued that DL is incapable of further progress because neural networks struggle with this kind of symbol manipulation. By contrast, many DL researchers are convinced that DL is already engaging in symbolic reasoning and will continue to improve at it.

At the heart of this debate are two different visions of the role of symbols in intelligence, both biological and mechanical: One holds that symbolic reasoning must be hard-coded from the outset and the other holds it can be learned through experience, by machines and humans alike. As such, the stakes are not just about the most practical way forward, but also how we should understand human intelligence — and, thus, how we should pursue human-level artificial intelligence.

Kinds Of AI

Symbolic reasoning demands precision: Symbols can come in a host of different orders, and the difference between (3-2)-1 and 3-(2-1) is important, so performing the right rules in the right order is essential. Marcus contends this kind of reasoning is at the heart of cognition, essential for providing the underlying grammatical logic to language and the basic operations underlying mathematics. More broadly, he holds this extends into our more basic abilities, where there is an underlying symbolic logic behind causal reasoning and reidentifying the same object over time.

The field of AI got its start by studying this kind of reasoning, typically called Symbolic AI, or “Good Old-Fashioned” AI. But distilling human expertise into a set of rules and facts turns out to be very difficult, time-consuming and expensive. This was called the “knowledge acquisition bottleneck.” While simple to program rules for math or logic, the world itself is remarkably ambiguous, and it proved impossible to write rules governing every pattern or define symbols for vague concepts.

This is precisely where neural networks excel: discovering patterns and embracing ambiguity. Neural networks are a collection of relatively simple equations that learn a function designed to provide the appropriate output for whatever is inputted to the system. For example, training a visual recognition system will ensure all the chair images cluster together, allowing the system to tease out the vague set of indescribable properties of such an amorphous category. This allows the network to successfully infer whether a new object is a chair, simply by how close it is to the cluster of other chair images. Doing this with enough objects and with enough categories results in a robust conceptual space, with numerous categories clustered in overlapping but still distinguishable ways.

“At stake are questions not just about contemporary problems in AI, but also questions about what intelligence is and how the brain works.”

These networks can be trained precisely because the functions implemented are differentiable. Put differently, if Symbolic AI is akin to the discrete tokens used in symbolic logic, neural networks are the continuous functions of calculus. This allows for slow, gradual progress by tweaking the variables slightly in the direction of learning a better representation — meaning a better fit between all the data points and the numerous boundaries the function draws between one category and another. This fluidity poses problems, however, when it comes to strict rules and discrete symbols: When we are solving an equation, we usually want the exact answer, not an approximation.  

Since this is where Symbolic AI shines, Marcus recommends simply combining the two: Inserting a hard-coded symbolic manipulation module on top of a pattern-completion DL module. This is attractive since the two methods complement each other well, so it seems plausible a “hybrid” system with modules working in different ways would provide the best of both worlds. And it seems like common sense, since everyone working in DL agrees that symbolic manipulation is a necessary feature for creating human-like AI.

But the debate turns on whether symbolic manipulation needs to be built into the system, where the symbols and capacity for manipulating are designed by humans and installed as a module that manipulates discrete symbols and is consequently non-differentiable — and thus incompatible with DL. Underlying this is the assumption that neural networks can’t do symbolic manipulation — and, with it, a deeper assumption about how symbolic reasoning works in the brain.

Symbolic Reasoning In Neural Nets

This assumption is very controversial and part of an older debate. The neural network approach has traditionally held that we don’t need to hand-craft symbolic reasoning but can instead learn it: Training a machine on examples of symbols engaging in the right kinds of reasoning will allow it to be learned as a matter of abstract pattern completion. In short, the machine can learn to manipulate symbols in the world, despite not having hand-crafted symbols and symbolic manipulation rules built in.

Contemporary large language models — such as GPT-3 and LaMDA — show the potential of this approach. They are capable of impressive abilities to manipulate symbols, displaying some level of common-sense reasoning, compositionality, multilingual competency, some logical and mathematical abilities and even creepy capacities to mimic the dead. If you’re inclined to take symbolic reasoning as coming in degrees, this is incredibly exciting.

But they do not do so reliably. If you ask DALL-E to create a Roman sculpture of a bearded, bespectacled philosopher wearing a tropical shirt, it excels. If you ask it to draw a beagle in a pink harness chasing a squirrel, sometimes you get a pink beagle or a squirrel wearing a harness. It does well when it can assign all the properties to a single object, but it struggles when there are multiple objects and multiple properties. The attitude of many researchers is that this is a hurdle for DL — larger for some, smaller for others — on the path to more human-like intelligence.

“Does symbolic manipulation need to be hard-coded, or can it be learned?”

However, this is not how Marcus takes it. He broadly assumes symbolic reasoning is all-or-nothing — since DALL-E doesn’t have symbols and logical rules underlying its operations, it isn’t actually reasoning with symbols. Thus, the numerous failures in large language models show they aren’t genuinely reasoning but are simply going through a pale imitation. For Marcus, there is no path from the stuff of DL to the genuine article; as the old AI adage goes, you can’t reach the Moon by climbing a big enough tree. Thus he takes the current DL language models as no closer to genuine language than Nim Chimpsky with his few signs of sign language. The DALL-E problems aren’t quirks of a lack of training; they are evidence the system doesn’t grasp the underlying logical structure of the sentences and thus cannot properly grasp how the different parts connect into a whole.

This is why, from one perspective, the problems of DL are hurdles and, from another perspective, walls. The same phenomena simply look different based on background assumptions about the nature of symbolic reasoning. For Marcus, if you don’t have symbolic manipulation at the start, you’ll never have it.

By contrast, people like Geoffrey Hinton contend neural networks don’t need to have symbols and algebraic reasoning hard-coded into them in order to successfully manipulate symbols. The goal, for DL, isn’t symbol manipulation inside the machine, but the right kind of symbol-using behaviors emerging from the system in the world. The rejection of the hybrid model isn’t churlishness; it’s a philosophical difference based on whether one thinks symbolic reasoning can be learned.

The Nature Of Human Thought

Marcus’s critique of DL stems from a related fight in cognitive science (and a much older one in philosophy) concerning how intelligence works and, with it, what makes humans unique. His ideas are in line with a prominent “nativist” school in psychology, which holds that many key features of cognition are innate — effectively, that we are largely born with an intuitive model of how the world works.

A central feature of this innate architecture is a capacity for symbol manipulation (though whether this is found throughout nature or whether it is human-specific is debated). For Marcus, this symbol manipulation capacity grounds many of the essential features of common sense: rule-following, abstraction, causal reasoning, reidentifying particulars, generalization and a host of other abilities. In short, much of our understanding of the world is given by nature, with learning as a matter of fleshing out the details.

There is an alternate, empiricist view that inverts this: Symbolic manipulation is a rarity in nature, primarily arising as a learned capacity for communicating acquired gradually by our hominin ancestors over the last two million years. On this view, the primary cognitive capacities are non-symbolic learning abilities bound up with improving survival, such as rapidly recognizing prey, predicting their likely actions, and developing skillful responses. This assumes that the vast majority of complex cognitive abilities are acquired through a general, self-supervised learning capacity, one that acquires an intuitive world-model capable of the central features of common sense through experience. It also assumes that most of our complex cognitive capacities do not turn on symbolic manipulation; they make do, instead, with simulating various scenarios and predicting the best outcomes.

“The inevitable failure of deep learning has been predicted before, but it didn’t pay to bet against it.”

This empiricist view treats symbols and symbolic manipulation as simply another learned capacity, one acquired by the species as humans increasingly relied on cooperative behavior for success. This regards symbols as inventions we used to coordinate joint activities — things like words, but also maps, iconic depictions, rituals and even social roles. These abilities are thought to arise from the combination of an increasingly long adolescence for learning and the need for more precise, specialized skills, like tool-building and fire maintenance. This treats symbols and symbolic manipulations as primarily cultural inventions, dependent less on hard wiring in the brain and more on the increasing sophistication of our social lives.

The difference between these two views is stark. For the nativist tradition, symbols and symbolic manipulation are originally in the head, and the use of words and numerals are derived from this original capacity. This view attractively explains a whole host of abilities as stemming from an evolutionary adaptation (though proffered explanations for how or why symbolic manipulation might have evolved have been controversial). For the empiricist tradition, symbols and symbolic reasoning is a useful invention for communication purposes, which arose from general learning abilities and our complex social world. This treats the internal calculations and inner monologue — the symbolic stuff happening in our heads — as derived from the external practices of mathematics and language use.

The fields of AI and cognitive science are intimately intertwined, so it is no surprise these fights recur there. Since the success of either view in AI would partially (but only partially) vindicate one or the other approach in cognitive science, it is also no surprise these debates are intense. At stake are questions not just about the proper approach to contemporary problems in AI, but also questions about what intelligence is and how the brain works.

What The Stakes Are — And Aren’t

The high stakes explain why claims that DL has hit a wall are so provocative. If Marcus and the nativists are right, DL will never get to human-like AI, no matter how many new architectures it comes up with or how much computing power it throws at it. It is just confusion to keep adding more layers, because genuine symbolic manipulation demands an innate symbolic manipulator, full stop. And since this symbolic manipulation is at the base of several abilities of common sense, a DL-only system will never possess anything more than a rough-and-ready understanding of anything.

By contrast, if DL advocates and the empiricists are right, it’s the idea of inserting a module for symbolic manipulation that is confused. In that case, DL systems are already engaged in symbolic reasoning and will continue to improve at it as they become better at satisfying constraints through more multimodal self-supervised learning, an increasingly useful predictive world model, and an expansion of working memory for simulating and evaluating outcomes. Introducing a symbolic manipulation module would not lead to more human-like AI, but instead force all “reasoning” operations through an unnecessary and unmotivated bottleneck that would take us further from human-like intelligence. This threatens to cut off one of the most impressive aspects to deep learning: its ability to come up with far more useful and clever solutions than the ones human programmers conceive of.

As big as the stakes are, though, it is also important to note that many issues raised in these debates are, at least to some degree, peripheral. There are sometimes claims that the high-dimensional vectors in DL systems should be treated like discrete symbols (probably not), whether the lines of code needed to implement a DL system make it a “hybrid” system (semantics), whether winning at complex games requires handcrafted, domain-specific knowledge or whether it can learn it (too soon to tell). There’s also a question of whether hybrid systems will help with the ethical problems surrounding AI (no).

And none of this is to justify the silliest bits of hype: Current systems aren’t conscious, they don’t understand us, reinforcement learning isn’t enough, and you can’t build human-like intelligence just by scaling up. But all these issues are peripheral from the main debate: Does symbolic manipulation need to be hard-coded, or can it be learned?  

Is this a call to stop investigating hybrid models (i.e., models with a non-differentiable symbolic manipulator)? Of course not. People should go with what works. But researchers have worked on hybrid models since the 1980s, and they have not proven to be a silver bullet — or, in many cases, even remotely as good as neural networks. More broadly, people should be skeptical that DL is at the limit; given the constant, incremental improvement on tasks seen just recently in DALL-E 2, Gato and PaLM, it seems wise not to mistake hurdles for walls. The inevitable failure of DL has been predicted before, but it didn’t pay to bet against it.

The post What AI Can Tell Us About Intelligence appeared first on NOEMA.

]]>
]]>
Making Common Sense https://www.noemamag.com/making-common-sense Tue, 29 Jun 2021 13:59:59 +0000 https://www.noemamag.com/making-common-sense The post Making Common Sense appeared first on NOEMA.

]]>
Credits

Jacob Browning is a postdoc in NYU’s Computer Science Department working on the philosophy of AI.

For as long as people have fantasized about thinking machines, there have been critics assuring us of what machines can’t do. Central to many of these criticisms is the idea that machines don’t have “common sense,” such as an artificial intelligence system recommending you add “hard cooked apple mayonnaise” or “heavy water” to a cookie recipe. 

In a seminal paper, “Representational Genera,” the late philosopher of AI John Haugeland argued that a unique feature of human understanding, one machines lack, is an ability to describe a picture or imagine a scene from a description. Understanding representations, Haugeland wrote, depends on “general background familiarity with the represented contents — that is, on worldly experience and skill.” It is our familiarity with representations, like the “logical representations” of words and the “iconic representations” of images, that allow us to ignore scribbles on paper or sounds and instead grasp what they are about — what they are representing in the world. 

Which is why OpenAI’s recently released neural networks, CLIP and DALL-E, are such a surprise. CLIP can provide descriptions of what is in an image; DALL-E functions as a computational imagination, conjuring up objects or scenes from descriptions. Both are multimodal neural networks, artificial intelligence systems that discover statistical regularities in massive amounts of data from two different ways of accessing the same situation, such as vision and hearing. 

“For as long as people have fantasized about thinking machines, there have been critics assuring us of what machines can’t do.”

CLIP and DALL-E are fed words and images and must discern correspondences between specific words and objects, phrases and events, names and places or people, and so on. Although the results — as with all contemporary AI — have their mix of jaw-dropping successes and embarrassing failures, their abilities reveal some insight into how representations inform us about the world. 

In many criticisms of AI, one that CLIP and DALL-E expose, the meaning of common sense is ambiguous. Many observers seem to imagine common sense as a matter of words, such as a bunch of sentences in the head cataloging the beliefs a person holds. Another approach would be to base common sense around mental images, like a massive model of the world our brains can consult. 

Haugeland opened up yet another approach to common sense — which he did not take — centered around neural networkswhich are a kind of “distributed representation.” This way of representing the world is less familiar than logical and iconic, but arguably the most common. It treats common sense not as a matter of knowing things about the world, but a matter of doing things in the world.  

The Distinction Between Logical And Iconic

In his article “Representational Genera,” Haugeland noted that humans use many kinds of representations, like the pictures we frame and hang around the house or the descriptions that fill books. He argued that what distinguishes logical, iconic and distributed representations is what they can or cannot represent about the world. Each only represents a small portion of the world and can do so in a peculiar way — capturing some features but ignoring many others.

Humans absorb these representations using background knowledge, “fleshing out” missing details based on common sense. Shorn of background knowledge, logical contents — a single word or phrase, a few notes on a music score, the markings in an equation or sentence — typically represent only what philosophers call “discrete facts”: objects and properties, musical phrases or the relation of numbers in an equation. 

By contrast, iconic representations — images, maps, music recordings or videos — involve elements that only make sense in relation to each other: shapes in a picture, the location of a mountain range or the various positions and movements of actors in a movie. Iconic representations depend on the relationship between elements and their locations, like how a black-and-white photograph represents certain wavelengths of light at different locations. Both kinds of representation are expressive, but logical representations cannot capture relations between elements without adding more information, whereas iconic representations cannot depict elements non-relationally. 

Neither of these forms of representation reflects how we experience them. Musicians looking at a familiar musical score — a logical representation — will instantly imagine their favorite recording of the piece: an iconic representation. But this is the work of our background familiarity with both kinds of representation. 

“DALL-E and CLIP recognize and reproduce not just skeletal content but also flesh it out.”

Take an article about a recent New York mayoral debate. An image might show a series of human bodies standing awkwardly behind podiums with bright red, white and blue shapes and patterns behind them. By contrast, the article discusses policy ideas, personal attacks, one-liners and sharp rebukes about policing. At the skeletal level, these refer to entirely different things: a group of bodies on the one hand and a group of topics on the other. That we grasp the text and image as related is based on our background understanding of how news articles work, because we understand the bodies are people running for office who are talking to and about each other.

These are the kinds of skills needed for switching between representations that Haugeland understood as beyond the abilities of machines. And this is why the success of DALL-E and CLIP is so surprising. These systems recognize and reproduce not just skeletal content but also flesh it out, contextualizing it with tacit information implied by the logical modality that bears on what should be depicted in the iconic modality. 

Take a specific example: There is no generic image DALL-E can generate when faced with the phrase “football player evading a defender,” no one-for-one correspondence the machine can learn that would allow it to memorize the right answer. Instead, it needs to discern a many-to-many correspondence that captures all kinds of different features: two players, fully clothed, on a field, under lighting, with either a soccer ball at their feet or a football in their hand (but not both), close up or from a distance, surrounded by other players or maybe a referee but no eagles or bikes — and on and on. 

This means DALL-E needs to represent the world — or, at least, the visible world made available in static images — in terms of what matters based on the kinds of descriptions people give of a scene. Distributed representations, with neural nets being the most common kind, provide their own distinct way of representing things, one capable of pulling from both logical and iconic representations in the effortless ways humans do. 

Getting Distributed Representations Into View

We are familiar with logical and iconic representations because they are ubiquitous artifacts of our everyday lives. Distributed representations, on the other hand, have only recently become artifacts because of the success of deep learning, though they are both older and more common than anything artificial. Evolution stumbled onto this kind of solution for brains early on, since these networks provide an incredibly efficient means for representing the world in terms of what matters for the agent in order to act appropriately. Contemporary AI roughly mimics some of the architectural design and learning tactics present in all brains to approximate feats pulled off by nature.

Haugeland suggested we think of distributed representations as representing skills or know-how. It may seem strange to say a skill “represents” something, but skills depend on recognizing the relevant patterns in a task, grasping which nuances and differences matter and which responses are the most appropriate. 

The skill for playing ping-pong, for example, needs to represent the look of a ball with spin related to a peculiar swing of the paddle, as well as which responses will be effective. The speed of the game requires recognition and response to happen instantaneously, far faster than we can consciously understand spin and decide how to react. Neural networks, biological and artificial, condense recognition and response into the same act.

For a familiar example in AI, take highway driving. It is a relatively simple task: ensure the car is equidistant between lane markers, maintain a constant distance to the next car, and — when lane changes are necessary — find out the relative position of cars in close proximity. This means the system can be precisely tuned to these patterns of visual data — lane markers, car shapes and relative distance — and ignore all the other stuff, like the colors of the cars or the chipped paint on the lane markers. There are only a few outputs available — maintain speed, go faster, go slower, stop, turn left, turn right — and the correct one is usually largely defined by visual inputs: brake if too close, turn slightly to stay in lane and so on. 

“Contemporary AI roughly mimics some of the architectural design and learning tactics present in all brains to approximate feats pulled off by nature.”

The skeletal content of a distributed representation of highway driving, then, is just the association between the relevant visual patterns in the input that will trigger one output rather than another. The result is a highly detailed representation of the situation, but one that is different than logical or iconic representations. The distributed representation has nothing in it that “looks like” a car or acts as a “description” of the road. It instead encodes how certain visual patterns fit together in a way that reliably tracks cars and, thus, should be handled in a certain way. When humans go on “autopilot” while driving, they plausibly resort to a similar representation, effortlessly and unconsciously responding to lanes, cars and potholes — largely without noticing much of anything.

The main challenge for these skills is the same one facing humans: preventing a “deer in headlights” moment. Many infrequent events will be represented in the model, like driving on a slippery road or under limited visibility. But really rare events will fail to be represented at all and will instead be treated as something else; there likely won’t be a representation of a deer in the road, so the system will (hopefully) lump it into the broad category of nondescript obstacles and respond by slamming on the breaks. 

This indicates a limit of the representation, which is that many possible inputs simply won’t be sufficiently distinct because they aren’t statistically common enough to be relevant. These distributed representations, in this sense, have a kind of tunnel vision — they represent what elements are most essential for the task and leave out the rest. But this goes for both biological and artificial networks, as well as logical and iconic representations; no representation can represent everything. 

“Neural networks, biological and artificial, condense recognition and response into the same act.”

With CLIP and DALL-E, what matters is capturing how things should look in relation to a particular phraseThis obviously requires insight into how words describe objects. But they also need to figure out what is tacitly indicated by the phrase — whether the object is in the foreground or background, posing or in action, looking at the camera or engaged in some task and so on.

Understanding what matters based on a phrase requires building up rough multimodal representations that, on the one hand, map the relationship of words with other words and, on the other hand, words with various kinds of images. A phrase with the word “democrat” needs to pull up not just Joe Biden but also blue flags, tacky bumper stickers and anthropomorphic donkeys in suits. The ability of CLIP and DALL-E to pull off these feats suggests they have something like common sense, since representing any particular element in a plausible way demands a tacit general understanding of many other elements and their interconnections — that is, all the other potential ways something could look or be described. 

But ascribing common sense to CLIP and DALL-E doesn’t feel quite right, since the task is so narrow. No living species would need to acquire a skill just for connecting captions and images. Both captions and images are social artifacts, governed by norms to keep them formulaic: short and sweet descriptions and crisp, focused images. They are useless at seemingly similar tasks, such as producing captions for videos or creating short movies. The whole activity is just too artificial — too specific and disconnected from the world. It seems like common sense, if it is anything, should capture more generality than this.

Rethinking Common Sense

An old philosophical tradition took common sense to be the place where all our modalities come together — where touch, taste and vision united in the mind to form a multimodal iconic model of the external world. For AI researchers operating in the 20th century, it was more common to think of a giant written encyclopedia, where our beliefs were written down in cross-referencing sentences — a database of logical representations. 

But in either case, it required someone to consult these models or databases, a central reasoner who would pick out what matters in the models or databases (or both) to figure it all out. It is no surprise people struggled with creating common-sense AI since it seemed you’d need both a system that could know everything but also know how to access all the relevant stuff when solving a common-sense puzzle.

But when normal people talk of common sense, it tends to be because someone lacks it — someone behaving awkwardly or saying stupid things. When we ascribe common sense, it is to people who behave normally — people who have the skills and know-how to navigate the world. This model of common sense is less like the logical and iconic versions, where it is expected common sense is some giant body of knowledge in the brainand instead hews closer to what we see in distributed representations. 

“What is meaningful to each species is relative to the world they inhabit, and what isn’t meaningful just doesn’t need to be represented.”

Neural networks often generate a distributed representation that captures the right way to understand and act given a specific task. Multimodal neural networks allow these distributed representations to become much more robust. In CLIP and DALL-E’s case, the rich connections between logical and iconic representations provide them with a background familiarity about the world — discerning not just how words hang together but also what they imply about what things look like. 

This approach to understanding makes more sense from an evolutionary perspective: let each species come up with the appropriate representations relative to its body, modalities and skills. What is meaningful to each species is relative to the world they inhabit, and what isn’t meaningful just doesn’t need to be represented. The common sense of a dog is its ability to do lots of dog-like things well, but there certainly isn’t any central reasoner inside a dog, or any database of language-like sentences specifying their beliefs and desires. A species represents its world in terms of how it should respond, and leaves the rest unrepresented.

This more modest take on common sense has implications for supposed worries about superintelligent machines hoovering up vast amounts of data — perhaps the encyclopedia of beliefs or the model of everything — which then leads to an omnicompetent general reasoner. But CLIP and DALL-E demonstrate that this is backwards: doing precedes knowing, and what we need to do determines what we know. Any representation of the world — logical, iconic or distributed — involves an assumption about what does and does not matter; you don’t take a picture of a sound. Humans know a lot because they do a lot — not vice-versa.

Machine understanding is not an all-or-nothing matter. Machines will continue to understand more through the piecemeal accumulation of skills that expand what they can do. This means artificial general intelligence won’t look like what we thought, but will likely be similar to us — a bundle of skills with rough-and-ready representations of what it needs to know to accomplish its various tasks. There is no more to general intelligence than that.

The post Making Common Sense appeared first on NOEMA.

]]>
]]>
Learning Without Thinking https://www.noemamag.com/learning-without-thinking Tue, 29 Dec 2020 13:38:46 +0000 https://www.noemamag.com/learning-without-thinking The post Learning Without Thinking appeared first on NOEMA.

]]>
Credits

Jacob Browning is a postdoc in NYU’s Computer Science Department working on the philosophy of AI.

“Mindless learning.” The phrase looks incoherent — how could there be learning without a learner?

Learning — broadly defined as improving at solving a given task over time — seems to require some conscious agent reflecting on what is happening, drawing out connections and deciding what strategies will work better. Contrary to intuition, learning is possible in the absence of any thought or even the capacity to think — agentless, mindless learning. In fact, this kind of learning is central to contemporary artificial intelligence.

Tracing a history of the idea of mindless learning can replace our anthropocentric intuitions about learning and thinking with an awareness of the different ways — both natural and artificial — that problems can be solved. It can also reshape our sense of what is capable of learning, and the benefits attached to non-human kinds of learning.

As it is commonly understood, thinking is a matter of consciously trying to connect the dots between ideas. It’s only a short step for us to assume that thinking must precede learning, that we need to consciously think something through in order to solve a problem, understand a topic, acquire a new skill or design a new tool. This assumption — an assumption shared by early AI researchers — suggests that thinking is the mechanism that drives learning. Learning depends on reasoning, our capacity to detect the necessary connections — causal, logical and mathematical — between things.

Think of how someone learns to grasp a few geometric proofs about the length of lines and then the area of squares, moving and turning imaginary shapes in their head until they discern how the pieces relate. Identifying the essential features of lines and squares allows them to draw out necessary connections between other shapes and their interrelations — using old rules to generate novel inferences about circles, triangles and a host of irregular shapes.

“Learning is possible in the absence of any thought or even the capacity to think.”

Our capacity to reason so impressed Enlightenment philosophers that they took this as the distinctive character of thought — and one exclusive to humans. The Enlightenment approach often simply identified the human by its impressive reasoning capacities — a person understood as synonymous with their mind.

This led to the Enlightenment view that took the mind as the motor of history: Where other species toil blindly, humans decide their own destiny. Each human being strives to learn more than their parents and, over time, the overall species is perfected through the accumulation of knowledge. This picture of ourselves held that our minds made us substantively different and better than mere nature — that our thinking explains all learning, and thus our brilliant minds explain “progress.”

This picture, though flattering to our vanity as a species, was met with skepticism by less anthropocentric thinkers like David Hartley, David Hume and James Mill. These skeptics instead argued that human learning is better understood as similar to the stimulus-responsive learning seen in animals, which hinges on creating associations between arbitrary actions that become lifelong patterns of behavior.

Training an animal often involves getting it to associate a sign (like snapping fingers) with an action (like sitting) by rewarding them with a treat for success. The same associations work equally well among humans: A teacher can reward correct answers with treats or gold stars to promote more studying, and a manager can praise good work to encourage similar behavior in the future.

This approach — later popularized by the behaviorists — held that rewarded associations can account not just for trained behavior but for any aspect of animal behavior, even seemingly “thoughtful” behavior. My cat seems to understand how can openers work and that they contain cat food, but there isn’t necessarily any reasoning involved. Anything that sounds like the can opener, or even looks like a can, will result in the same behavior: running into the kitchen, meowing expectantly, scratching at the empty food bowl. And what works for animals works in humans as well. Humans can accomplish many tasks through repetition without understanding what they are doing, as when children learn to multiply by memorizing multiplication tables through tedious practice and recitation.

“Where other species toil blindly, humans decide their own destiny.”

Many philosophers and scientists have argued that associative learning need not be limited to explaining how an individual animal learns. They contend that arbitrary events in diverse and non-cooperative agents could still lead to problem-solving behavior — a spontaneous organization of things without any organizer.

The 18th-century economist Adam Smith, for example, formulated the concept of the “invisible hand of the market,” which revealed the capacity for millions of localized interactions between strangers to push a market toward a dynamic, efficient and thoroughly amoral deployment of resources over time. This perspective enables casting off the rationalist conviction that the mind is the acting force of history and all progress is the result of “geniuses” and “great men” (because it was always men in these male philosophers’ telling). Rather, “progress” — if it is appropriate to use the term — in politics, language, law and science results not from any grand plan but instead from countless, undirected interactions over time that adaptively shape groups towards some stable equilibrium amongst themselves and their environment.

As Charles Darwin saw, this kind of adaptation was not unique to humans or their societies. The development of species emerges from chance events over time snowballing into increasingly complex capacities and organs. This doesn’t involve progress or the appearance of “better” species, as some assumed at the time. Rather, it suggests that members of a species will eventually develop an adequate fit with other species in a given environment. As such, mindless learning is more natural and commonplace than the minded variety we value so highly.

This history helps us recognize contemporary artificial intelligence as deploying a familiar, well-used technique for solving problems: many trials, a mechanism for varying responses on each trial and some indication of whether a solution is a better or worse fit for the task at hand. The mathematical sophistication and sheer computing power behind contemporary AI systems can deceive us into thinking the process is a kind of technological wizardry only possible through human ingenuity.

But it’s possible for us to think about machine learning otherwise. The artist Philipp Schmitt, for example, built his physical neural network out of simple parts, wooden boards and very dumb sensors to highlight the underlying simplicity of machine learning that’s lost when it gets gussied up in complex diagrams and mathematical formalisms. The machine arrives at appropriate answers without consciousness, imagination or reasoning; these are all irrelevant for the kind of learning that Schmitt’s neural network undertakes.

As with evolution, there are far more trials ending in failure than in success. However, these failures are a feature, not a bug. Repeated failure is essential for mindless learning to test out many possible solutions — as in the 30 million trials leading up to AlphaGo’s famous move 37 in its game against the second-best Go player in the world, Lee Sedol. Going through so many trials meant AlphaGo played many moves no human had ever made, principally because these moves usually result in losing. Move 37 was just such a long shot, and it was so counterintuitive that it made Lee visibly uncomfortable.

“‘Progress’ in politics, language, law and science results not from any grand plan but instead from countless, undirected interactions over time.”

New mindless learning machines — like the headline-grabbing GPT-3, a tool that uses deep learning to create human-readable text  — help point toward the expansive possibilities of non-human intelligence. GPT-3 is trained by hiding a handful of words in millions upon millions of texts, forcing the machine to guess the hidden word at random, and then telling it how different its answers are from the original. In the process, it learns when to use a noun instead of a verb, what adjective works and when a preposition is needed, and so on.

Many critics rightly note that GPT-3 doesn’t “understand” the world it is talking about, but this shouldn’t be seen as a criticism. The algorithm adapts to the world of text itselfthe statistically relevant ways humans deploy symbols — not to the world as such. It does not occupy our niche, the niche of social beings who use language for diverse reasons, but its own: the regular interplay of signs.

It was a failure of imagination that led to the Turing Test, which assumed language couldn’t be dissociated from human-like intelligence. That same limited viewpoint rejected the idea that language might form its own ecosystem, into which a being might fit without human-like intelligence.

Undermining the anthropocentrism of earlier assumptions, GPT-3 displays a wildly diverse facility with language. It can write poetry in many styles, code in many programming languages, create immersive interactive adventures, design websites and prescribe medicines, conduct interviews in the guise of famous people, explain jokes and generate surprisingly plausible undergraduate philosophy papers. Mocking its failure to be human is to overlook how distinctly successful it is at being itself: an impossibly well-adapted machine for generating plausible scripts in almost any text-based scenario.

As machine learning continues to develop, the intuition that thinking necessarily precedes learning — much less that humans alone learn — should wane. We will eventually read surprising headlines, such as the recent finding that a machine is a champion poker player or that slime molds learn, without immediately asking: “Does it have a mental life? Is it conscious?” Mindless learning proves that many of our ideas about what a mind is can be broken into distinct mindless capacities. It pushes us to ask more insightful questions: If learning doesn’t need a mind, why are there minds? Why is any creature conscious? How did consciousness evolve in the first place? These questions help clarify both how minded learning works in humans, and why it would be parochial to treat this as the only possible kind of learning AI should aspire to.  

“Much human learning has itself been mindless.”

We should be clear that much human learning has itself been, and still is, mindless. The history of human tools and technologies — from the prehistoric hammer to the current search for effective medicines — reveals that conscious deliberation plays a much less prominent role than trial and error. And there are plenty of gradations between the mindless learning at work in bacteria and the minded learning seen in a college classroom. It would be needlessly reductive to claim, as some have, that human learning is the only “real learning” or “genuine cognition,” with all other kinds — like association, evolution and machine learning — as mere imitations.

Rather than singling out the human, we need to identify those traits essential for learning to solve problems without exhaustive trial and error. The task is figuring out how minded learning plays an essential role in minimizing failures. Simulations sidestep fatal trials; discovering necessary connections rules out pointless efforts; communicating discards erroneous solutions; and teaching passes on success. Identifying these features helps us come up with machines capable of similar skills, such as those with “internal models” able to simulate trials and grasp necessary connections, or systems capable of being taught by others.

But the second insight is broader. While there are good reasons to make machines that engage in human-like learning, artificial intelligence need not — and should not — be confined to simply imitating human intelligence. Evolution is fundamentally limited because it can only build on solutions it has already found, permitting limited changes in DNA from one individual to the next before a variant is unviable. The result is (somewhat) clear paths in evolutionary space from dinosaurs to birds, but no plausible path from dinosaurs to cephalopods. Too many design choices, and their corresponding trade-offs, are already built in.

If we imagine a map of all possible kinds of learning, the living beings that have popped up on Earth take up only a small territory, and came into being along connected (if erratic) lines. In this scenario, humans occupy only a tiny dot at the end of one of a multitude of strands. Our peculiar mental capacities could only arise in a line from the brains, physical bodies and sense-modalities of primates. Constraining machines to retrace our steps — or the steps of any other organism — would squander AI’s true potential: leaping to strange new regions and exploiting dimensions of intelligence unavailable to other beings. There are even efforts to pull human engineering out of the loop, allowing machines to evolve their own kinds of learning altogether.

The upshot is that mindless learning makes room for learning without a learner, for rational behavior without any “reasoner” directing things. This helps us better understand what is distinctive about the human mind, at the same time that it underscores why the human mind isn’t the key to understanding the natural universe, as the rationalists believed. The existence of learning without consciousness permits us to cast off the anthropomorphizing of problem-solving and, with it, our assumptions about intelligence.

The post Learning Without Thinking appeared first on NOEMA.

]]>
]]>