Sign up & Download
Sign in

What attributes guide the deployment of visual attention and how do they do it?

by Jeremy M Wolfe, Todd S Horowitz
Nature Reviews Neuroscience ()

Abstract

As you drive into the centre of town, cars and trucks approach from several directions, and pedestrians swarm into the intersection. The wind blows a newspaper into the gutter and a pigeon does something unexpected on your windshield. This would be a demanding and stressful situation, but you would probably make it to the other side of town without mishap. Why is this situation taxing, and how do you cope?

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

What attributes guide the deploym...

PERSPECTIVES selective processes in the nervous system. We can attend to a specific task, attend to tactile stimuli in preference to auditory,attend to a specific visible stimulus that is 2�� to the left of fixation,and so on.This article is restricted to consideration of visual attention. Even within vision, there is good evidence that attention has its effects in diverse ways. Attention to a stimulus might enhance the signal produced by that stimulus3,4. It might more precisely tune the visual system to a stim- ulus attribute,excluding other input as noise3. Attention might restrict processing to one part of the visual field5 or to an object6,or it might restrict processing to a window in time7. Faced with this welter of possibilities, we will use an operational definition of one aspect of attention in this paper.We are concerned with the deployment of attention in visual search tasks. It is possible to discuss the role of attention in these tasks while remaining agnostic about distinctions between noise reduction, stimulus enhancement and so forth. In a typical visual search task, an observer looks for a target item among distracting items.In the laboratory,this might be a search for a big red vertical line in a display containing lines of other colours, sizes and orientations.However,visual search is no mere laboratory curiosity.From the search for socks in the laundry to the search for weapons in carry-on luggage, our environment abounds with search tasks. Indeed, these processes of attentional selection,revealed by visual search experiments,are presumably the processes that are used whenever anything in the world becomes the current object of visual attention. The starting point for any understanding of the deployment of attention in visual search is the observation that some search tasks are easy and efficient while others are not. Consider FIG. 1a. If you are asked to find the red target or the tilted target or the big target, it is intuitively clear that the number of distracting items does not make much difference. The colour, orientation or size attributes that define the targets can efficiently guide attention to the target. On the other hand, among these ���5���s there is a ���2��� target. Once it has been found, there is no difficulty in discriminating a 2 from a 5. However, attention cannot be guided by the spatial posi- tion information that differentiates those characters. The more 5s that are present, the more difficult the search task will be8. The purpose of this article is to review the status of these guiding attributes.What prop- erties can guide attention and what cannot? For about 25 years, the answer to that ques- tion has been framed in terms of Treisman���s highly influential feature integration theory9. Treisman followed Neisser10 in proposing a two-stage architecture for human vision (FIG. 2a) in which a set of basic features was generated in an initial, parallel,���preattentive��� stage. Other processes, like those that bound features to objects and permitted object recog- nition, were restricted to one or at most a few objects at a time. Consequently, attention was required to select a subset of the input for this more advanced processing. Later models, such as guided search11,12, kept the two-stage architecture but noted that the preattentive stage could guide the deployment of atten- tion to select appropriate objects for the second stage. Therefore, a preattentive stage that could process colour and orientation could efficiently guide attention to a target that was defined by the combination of colour and orientation (for example, a red vertical item) even if preattentive stages could not bind colour to orientation in parallel at all locations. As you drive into the centre of town, cars and trucks approach from several directions, and pedestrians swarm into the intersection. The wind blows a newspaper into the gutter and a pigeon does something unexpected on your windshield. This would be a demanding and stressful situation, but you would probably make it to the other side of town without mishap. Why is this situation taxing, and how do you cope? The world presents the visual system with an embarrassment of riches. Given a brain of any reasonable size, it is impossible to process everything everywhere at one time1. The human visual system copes with this prob- lem in a number of ways. Rather than having high-resolution processing at all locations, the best resolution is confined to the fovea, with massive losses in acuity occurring only a few degrees into the periphery. There are restrictions in the wavelengths of light that are processed,the spatial and temporal frequencies that can be detected,and so forth.All of these ���front-end���reductions in the amount of infor- mation fail to solve the problem. To deal with the still-overwhelming excess of input, the visual system has attentional mechanisms for selecting a small subset of possible stimuli for more extensive processing while relegating the rest to only limited analysis. Even though William James famously declared that ���Everyone knows what attention is���2, there is no single, satisfying definition of attention. The term covers a diverse set of NATURE REVIEWS | NEUROSCIENCE VOLUME 5 | JUNE 2004 | 1 What attributes guide the deployment of visual attention and how do they do it? Jeremy M. Wolfe and Todd S. Horowitz O PI N I O N �� 2004 Nature Publishing Group
Page 2
hidden
2 | JUNE 2004 | VOLUME 5 www.nature.com/reviews/neuro P E R S P E C T I V E S Conceiving of guidance as a control module also avoids a potential pitfall in models of the reverse hierarchy16 variety. It is reasonable to assume that attention can be guided by some ���late���information (see,for example,Torralba���s theoretical work on guidance by scene proper- ties18).If that information fed back onto early visual processes and acted as a filter,one could imagine odd recursive problems where feed- back about a scene reduced the ability to see the scene. Torralba���s model, for example, generates images where only the ground plane is visible during a search for people,but we are not meant to suppose that this is what is seen. As with the search for ���red���, it seems more plausible that late information could inform the guidance of attention by altering the repre- sentation in a guiding module placed outside the main pathway to object recognition. In the remainder of this article, we discuss the attributes that are abstracted from early vision that can guide attention. In keeping with the hypothesis that guidance is separate from the the main pathway to object recogni- tion,we avoid the use of the term ���preattentive��� and its associated theoretical implications. Attributes will be discussed in terms of their ability to guide the deployment of attention. Identifying ���guiding��� attributes One of the most productive ways to study the differences between visual search tasks is to measure reaction time (RT) ��� the time that is required to say that a target is present or absent ��� as a function of the set size (the number of items in the display). The slope of the RT �� set size function indexes the cost of adding an item to the search display. So, varying the set size in the colour search task in FIG. 1 will produce little or no change in RT. The slope will be near zero and we can label such a search as efficient. By contrast, in the search for a 2 among 5s,the slope will increase at a rate of about 20���40 ms per item for trials in which properties that are abstracted late in visual processing feed back onto early stages. In an approach that more closely resembles the architecture of FIG.2b,DiLollo and colleagues13 propose that ���Initial processing is performed by a set of input filters whose functional characteristics are programmable under the control of prefrontal cortex.��� For our purposes, there are two important points to be made about a guidance control module ��� wherever it is located in the brain. First, as the intersection example illustrates, it does not have access to all of the information that is available in the visual pathway that runs from early vision through the bottleneck to object recognition. Second, as DiLollo et al. note, when the control module exerts its control over access to the bottleneck, it is not acting as a filter in the simple physical sense of that term. The problem with filters is that they remove information. Consider the following: as we discuss below, guidance by attributes such as colour and orientation seems to be coarse and categorical.Attention is guided to ���red��� and ���steep���, not to 640 nm or 23�� left of vertical. Suppose that a target is known to be categorically ���red���. Filtering for ���red��� would pass what was red and reject what was not. However, imagine a task in which observers must determine whether a red object has a green spot on it, and not a black or a blue one. Introspection will tell you that this is a straightforward task, but a filter that elimi- nated the ���not-red���would make it impossible. Rather than altering the stimulus, as a filter might, the hypothetical control module guides selection like a security screener at an airport. Based on a rather abstract representa- tion of the notion of ���threat���, the screener selects some individuals for more attention than others. Although attending to an object or location might have perceptual consequences17, guidance itself should not. The original account was appealing.Simple features such as size and motion were extracted preattentively. More complex properties required attention.However,the accumulation of information about guiding attributes over the past 20 years makes it clear that this two- stage, linear approach will not work. Several lines of objection have been raised13,14,but the core problem for us is that there are multiple examples of���features���that are available early in visual processing and also in attentive vision, but that are not available to guide the deploy- ment of attention.At the same time,there are properties of guiding attributes that are not reflected in attentive vision.This makes it diffi- cult to envision the guiding representation as a stage in a linear sequence of visual processes, like a filter ��� even a tunable filter ��� between early vision and the attentional bottleneck. As an example, consider intersections. In FIG. 1b, it is not easy to find the two horizontal pairs of triangles. In FIG. 1c, it is quite easy because early visual processes can handle occlusion information15. Interpreting occlu- sion requires that the early visual system successfully interprets intersections. Clearly, later object recognition processes can use inter- section information. However, as shown in FIG.1d, intersection does not serve as a source of guidance8. The linear model would have to explain how intersection information could be present,then absent,then present again. It might be better to think of a ���guiding representation��� as a control device, sitting to one side of the main pathway from early vision to object recognition (FIG. 2b). Its contents are abstracted from the main path- way and it, in turn, controls access to the attentional bottleneck. However, it would not, itself,be part of the pathway. Departure from the linear model has been a feature of several recent theoretical approaches to the guidance of attention. Hochstein and Ahissar16 offer a ���reverse hierarchy���model a d b c Figure 1 | Easy and difficult examples of visual search. a | It is easy to find the red, tilted or big ���5���. It is not easy to find the ���2��� among the ���5���s. b,c | It is difficult to find the horizontal pairs of triangles in b, but in c it is easy because the early visual system can use intersection information to infer that the blue items occlude pink rectangles. d | In this panel, search for the ���plus��� is inefficient because the intersection information here does not guide attention. �� 2004 Nature Publishing Group

Readership Statistics

205 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
35% Ph.D. Student
 
14% Post Doc
 
9% Student (Master)
by Country
 
29% United States
 
17% Germany
 
7% United Kingdom

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in