Not-So-Elementary Watson: What IBM’s Jeopardy! Computer Means for Turing Tests and the Future of Artificial Intelligence

Previously, I discussed how Watson, IBM’s Jeopardy!-playing computer prodigy, manages to match or surpass the performance of humans on the game. The computer—which is really a cluster of servers reading hundreds of millions of pages’ worth of stored information about the world—can simultaneously generate and evaluate thousands of hypothetical responses to clues on the game board. It decides which of those possible answers seems best supported by all the facts (including what Watson learns about the kinds of responses the game’s producers want) and then, if any of those answers meets its “confidence” criteria, the computer buzzes in. To be a good player, Watson has to accomplish all of these feats, from interpreting the text of the clue to choosing a best response, in less time than its human competitors—on average, about three seconds.

To any of us who have gritted our teeth through the feebleness and inadequacies of most automated response systems on phone directories and help lines, that level of performance can seem breathtaking. In fact, it so closely mimics the kind of artificial intelligence (AI) seen in science fiction, one might wonder whether researchers really are on the verge of creating robotic brains to rival our own any day now.

The short answer is no, although Watson’s success probably does augur a satisfyingly rapid boost to come in the usefulness of a wide variety of automatic systems. Bullish believers in the prospects for AI may see Watson as emphatic proof that they are on the right track, but I think a slightly more restrained reaction is probably in order. The crucial issue, I’ll argue, is scale.

To state the obvious, IBM didn’t build Watson to make money from game shows. With its ongoing DeepQA program for building systems like Watson that can knowledgeably respond to questions asked of it in standard, casual English, the company has its eyes on far bigger prizes. Research manager Eric Brown notes that IBM is looking at specific applications in medicine and healthcare, as well as “things like help desk, tech support, and business intelligence applications … basically anyplace where you need to go beyond just document search, but you have deeper questions or scenarios that require gathering evidence and evaluating all of that evidence to come up with meaningful answers.”

Similarly, in his June 16, 2010 article on Watson for the New York Times Magazine, Clive Thompson reported (emphasis added):

I.B.M. PLANS TO begin selling versions of Watson to companies in the next year or two. John Kelly, the head of I.B.M.’s research labs, says that Watson could help decision-makers sift through enormous piles of written material in seconds. Kelly says that its speed and quality could make it part of rapid-fire decision-making, with users talking to Watson to guide their thinking process.

“I want to create a medical version of this,” he adds. “A Watson M.D., if you will.” He imagines a hospital feeding Watson every new medical paper in existence, then having it answer questions during split-second emergency-room crises. “The problem right now is the procedures, the new procedures, the new medicines, the new capability is being generated faster than physicians can absorb on the front lines and it can be deployed.” He also envisions using Watson to produce virtual call centers, where the computer would talk directly to the customer and generally be the first line of defense, because, “as you’ve seen, this thing can answer a question faster and more accurately than most human beings.”

(I will be interested to see if M.D.s take easily to relying on such DeepQA/Watson-type automated helpers. The stakes in medicine being as high as they are, and physicians’ egos being what they are, I can picture many doctors resisting putting too much faith into these systems when they are new. On the other hand, how might juries in malpractice suits look on physicians who went against the advice of such digital helpers?)

All of us who suffer through the aforementioned types of automated help-line hells may therefore also hope for a reprieve within just a few years. Machines capable of Watson-level general knowledge may remain out of many company’s budgets for years, but that may not matter. To play Jeopardy!, Watson needs a strongly diverse base of knowledge in history, literature, science, the arts and so on. The automated systems of most businesses would not. My travel agency’s automated booking system, for example, wouldn’t need to know the poems of Emily Dickinson or the date of the Norman conquest; it would need to know geography, vacation packages, airline schedules, my travel history, visa and vaccination requirements and the like.

It’s true that in any conversation with a member of the public, a digital helper could encounter references that it would not understand. But part of the beauty of the approach to answering queries that Watson is helping to pioneer is that the system recognizes when it does not really know a potentially trustworthy answer. In a conversation, the computer can then always ask for additional information to help it improve its deliberations, rather than blurting out an answer that is best but still not good enough. Thus, this DeepQA approach—and similar approaches that other companies could develop (because I don’t want to rule them out)—could scale down for many applications quite well, I think.

Now come the more philosophical questions. How close does something like Watson bring us to the goal of creating true artificial intelligences? The longstanding benchmark for an AI to pass is the Turing test, meaning that the machine could not be distinguished as nonhuman from its replies.

Even those close to the Watson project dismiss the idea that the system represents a Turing-level intelligence. Eric Brown, for example, remarks that Watson might be indistinguishable from a human playing Jeopardy!, but it lacks any good capability for general conversation. Stephen Wolfram, the computer scientist behind Mathematica and Wolfram Alpha, argues that Watson can only answer questions with objectively knowable facts and that it cannot offer a judgment.

Nevertheless, David Ferrucci, who headed the Watson project, seems hopeful that it offers useful lessons for bringing computer scientists closer to that Turing goal, or even more ambitious ones (via NYTimes):

At best, Ferrucci suspects that Watson might be simulating, in a stripped-down fashion, some of the ways that our human brains process language. Modern neuroscience has found that our brain is highly “parallel”: it uses many different parts simultaneously, harnessing billions of neurons whenever we talk or listen to words. “I’m no cognitive scientist, so this is just speculation,” Ferrucci says, but Watson’s approach — tackling a question in thousands of different ways — may succeed precisely because it mimics the same approach. Watson doesn’t come up with an answer to a question so much as make an educated guess, based on similarities to things it has been exposed to.

Ferrucci may indeed be right and Watson may embody the kernel of an insight into human thinking that might steer scientist toward better high-level artificial intelligences. However, it is also surely true that the human brain does not think simply by hatching and evaluating thousands of possible responses to every situation in the way that Watson does. Machine intelligences certainly do not need to work the way our brains do. But if the goal is to create an artificial intelligence that can match a human one, science will also need to be alert to efficient alternatives in our neurosystems that can help machines scale up.

As effective a general savant as Watson is in the context of Jeopardy!, it is still a computer optimized to do one thing: play that game. A machine with exactly the same approach that could be equally versatile “in the wild” would need to be much, much more powerful. That sort of brute force approach might work; it is, after all, a big part of how Deep Blue beat Garry Kasparov in their chess tournament. But it is probably a wildly inefficient way to build a machine with human-level cognition. Computing power may indeed be increasing exponentially, but expanding the capabilities of something like Watson toward that end might involve a processing problem that escalates even faster.

Eventually, of course, if nothing constrains the increase and application of the computing power, then even that horrific hypothetical level of brute force needed to simulate human intelligence would become available. But that would not represent a scalable solution except in a universe of unlimited resources and boneheaded stubbornness.

This, I suspect, is why strong optimists about AI such as futurist Ray Kurzweil (who foresees a computer passing the Turing test in 2029 and has made a wager to that effect with Mitch Kapor) and others who are more reserved or pessimistic (whom I’m tempted to call “neurorealists”) may argue right past each other. Kurzweil believes in exponentially accelerating technological growth that will overrun all obstacles. To those optimists, objections about the scale of certain approaches to AI are irrelevant because the passage of time will put any needed amount of computing power within reach. To the neurorealists, piling on computational resources without any clear regard for what might be a biologically guided way of deploying them makes it preposterous to think that anyone will bother with such a project. And because we currently have only the faintest glimmers about how such higher cognitive abilities emerge from our brains, the day when we can translate those mechanisms into something suitable for AI seems remote, too.

Both sides have a point. That’s why I’m leery of following those skyrocketing curves of technological growth to any imminent arrival of machine sentience in the absence of real breakthroughs in understanding how minds arise from brains. At the same time, as Watson does very well demonstrate, we can look forward to computers that can at least seem perfectly intelligent within narrow scopes very soon.

For more information:

Related Posts Plugin for WordPress, Blogger...
This entry was posted in Artificial Intelligence, Technology. Bookmark the permalink.

18 Responses to Not-So-Elementary Watson: What IBM’s Jeopardy! Computer Means for Turing Tests and the Future of Artificial Intelligence

  1. Pingback: How IBM’s Watson Computer Excels at Jeopardy! | Retort

  2. Linus Kan says:

    I am not so sure if doctors will not sign on because of their egos. They have toned down their standards so much as it is –

    My parents were doctors and I grew up in a medical environment. As a child, I overheard a conversation of an ObGyn, who was brushing off using the then new fangled ultrasound technology for checking on a fetus. “Fingers! That’s what they are for.”

    Today, a monthly ultrasound is default. Doctor’s egos are toned down by the financial incentives they are offered, like all our egos. I can see them use Watson.

  3. Nathan Cook says:

    There are plenty of people who think strong AI will be achieved relatively quickly by artificial neural systems or the like. Then there are those who think that the properties of neural networks can be captured in a more efficiently computable manner by structures such as Bayesian networks, but have no problem with borrowing the tricks that biological brains use. Then there are those who think that borrowing is unnecessary and a correct overarching theory of intelligence is what’s required. And so on.

    In other words a continuum exists between those who believe that even modelling every ion channel of each neuron in a simulated brain falls short of what is necessary, and those who disparage any approach that doesn’t include explicit symbolic representation at the lowest levels. To a large extent this difference in approach is independent of an assessment of how hard it is to achieve strong AI – your ‘neurorealists’ are simply people who happen to believe both that AI requires extensive reproduction of the internal structure of the human brain and that this will take longer to do than the 25 years or so that Kurzweil allows.

  4. Anonymous coward says:


  5. Pingback: A Really Expensive Way To Win A Game Show, Ctd |

  6. A. T. Murray says:

    Jeopardy will never be the same after IBM Watson vanquishes the human race. People watching Jeopardy will feel as if they are seeing substandard intellects compete at a level beneath the new standard of championship play. Attention and news coverage will shift to Jeopardy being played by non-human players such as IBM Watson from the USA, Mao-Mind from China, Berlioz from France, Sputnik Two from Russia, Ubergehirn from Germany, and so on. Human beings will be relegated to sports events, and barred from the highest of intellectual competitions.

  7. Steve says:

    Think about the possible implications of technology like this for the job market, especially knowledge workers (corporate drones in cubicles).

    See this article in Fortune Magazine:

    Will IBM’s Watson put your job in jeopardy?

  8. David says:

    It is amazing that technology can do this, imagine coming back with an answer in less than 3 seconds, and a complex one. I guess it will not be too long really before computers have a close capacity to humans, and we think it might be a long time, though with technology rapidly speeeeeeding up that time might be sooner rather than later.


  9. Pingback: IBM Watson-Jeopardy Challenge party « Decrepit Old Fool

  10. IBM’s Watson can answer Trivial Pursuit-type questions put to it very effectively, but it cannot yet ask questions in a thoughtful way that permits Watson to create new knowledge. However, this could be achieved in the foreseeable future and will probably require the additional sensory and effector abilities so that it can actively acquire new data. The HAL 9000 computer in the film “2001: A Space Odyssey” probably still serves as the most recognizable paradigm for an intelligent machine.

    Watson got its name from one of the early founders of IBM, Thomas Watson, who worked at and then eventually led IBM over a 42 year period. It was Thomas Watson who advanced the IBM motto “Think.” Incidentally, HAL 9000 got its name from taking the letters in IBM and changing then backward by a single letter in the English alphabet.

    Watson is rightly viewed as a major advancement in “artificial intelligence” (AI for short). Watson will certainly spur better interaction and retrieval of information from future computers by humans.

    Interestingly, the Watson affair forces us to review the definition of “intelligence” itself. This remains a somewhat ambiguous term. I am inclined to accept the more broad definition: “The ability of a system to adapt to a changing environment in a way that ensures its survival and propagation.” This might first appear to be restricted to biological systems, but I think otherwise. As computers advance in their cognitive abilities, it is going to become exceeding difficult to view them as possessing “artificial intelligence.” Perhaps “synthetic intelligence” may be more apt phraseology as this type of intelligence is an off-spring of the human imagination that will ultimately exceed its creator.

  11. Barry says:

    I found Watson’s performance unimpressive. Receiving the questions in a text file? Algorithms no more sophisticated than AutoSpell on my blackberry? Is this really groundbreaking?

    Network enough Intel processors together and you’ll get a big enough supercomputer. That’s not the text big think in computing. The next step is VR (voice recognition). Watson proved he couldn’t distinguish homonym or synonym usage, even with sophisticated algorithms.

  12. Janis says:

    I keep thinking that until Watson is commanded to play Jeopardy and it replies, “I don’t feel like it,” and then refuses, we don’t have anything special on our hands.

  13. Bill says:

    Two thoughts spring to mind upon reading this. First, it is very likely that the NSA/Homeland Security branch(es) of government will be early adopters (if they aren’t already). Many times it has been stated that the trouble with intercepting phone calls/email etc is the sheer scale. Watson can play on that scale (and probably upward).

    Second of all, it seems that Watson is very dependent upon having the correct info. Suppose it digests enough disinformation that it can no longer discern from “right” and “wrong”. Can it be proofed against false information? Won’t it ultimately have to depend upon humans to do this screening? Doesn’t this paint a vivid picture of Watson as an agenda driven tool, serving whomever controls the input data stream? From what I saw, it did very little inferring, mostly just regurgitation. This brings to mind many dystopian scenarios (let’s not go there).

    PS The Science talk podcast pointed me to this blog which has now been bookmarked.

  14. Pingback: Can an AI Pass The Social Media Turing Test?|The Essence of Being Me

  15. Greg Snow says:

    Has IBM ever had Watson have a conversation with itself?
    For example: Two Chatbots engaged in conversation:

  16. Greg G says:

    I remember a video of Ray Kurzweil giving a talk, not to long ago. He actually believes multiple approaches to AGI (artificial general intelligence) not just brute computing power. He believes that where one type excels, in a certain area, it will replace deficiences in the other and the opposite as well. Or the anology of biological brain functions could be put into a computational form. There may be a composite type of AGI that eventuates.
    So I think it is wrong to say Ray Kurzweil only believes in brute force AI.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>