Guided search: an alternative to ...
Journal of Experimental Psychology: Human Perception and Performance 1989, Vol. 15, No. 3, 419-433 Copyright 1989 by the American Psychological Association, Inc. 0096-1523/89/S00.75 Guided Search: An Alternative to the Feature Integration Model for Visual Search Jeremy M. Wolfe, Kyle R. Cave, and Susan L. Franzel Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Subjects searched sets of items for targets defined by conjunctions of color and form, color and orientation, or color and size. Set size was varied and reaction times (RT) were measured. For many unpracticed subjects, the slopes of the resulting RT x Set Size functions are too shallow to be consistent with Treisman's feature integration model, which proposes serial, self-terminating search for conjunctions. Searches for triple conjunctions (Color x Size x Form) are easier than searches for standard conjunctions and can be independent of set size. A guided search model similar to Hoffman's (1979) two-stage model can account for these data. In the model, parallel processes use information about simple featuresto guide attention in the search for conjunctions. Triple conjunctions are found more efficiently than standard conjunctions because three parallel processes can guide attention more effectively than two. Searches for a target among a number of distractor items are easier for some stimuli than for others. For example, targets defined by a unique color or a unique orientation are found easily (Tresiman & Gelade, 1980). If we measure the time required to determine that a target is present, we find that the reaction time (RT) is short and nearly independent of the number of distractor items. The target, if present, appears to be found in "parallel" with the visual system examining all items at once. Other searches are not so effort- less. A T can be found among a field of Ls, but the time required to find that T will increase markedly as the number of distracting Ls increases (Julesz & Bergen, 1983). The slope of the function relating RT to number of distractors gives an estimate of the cost in search time of each additional distrac- tor. In a T versus L task, additional items seem to be processed at a rate of about 40-60 ms apiece (e.g., Julesz & Bergen, 1983). If the T is located by a "serial," self-terminating search (Donders, 1868/1969 Sternberg, 1969), a 2:1 ratio in slopes is predicted between RT x Set Size functions for trials with a target present and blank trials containing only distractors. On blank trials, the subject must examine each item in order to confirm that no target is present. This yields a slope of 40 ms/item. On target trials, the subject must examine anaverage of half of the items before finding the target, yielding a slope of 20 ms/item. (The terms serial and parallel must be used with caution.)1 We thank Amy Shorter, Charles Pokorny, and Art Figel for help in data collection and analysis, and Anne Treisman, David Irwin, James Pomerantz, Karen Yu, Nancy Kanwisher, and Jeff Schall for useful comments on drafts of this article. This research was supported by National Institutes of Health Grant No. EY05087, the Whitaker Health Sciences Fund, and the Educational Foundation of America. We thank IBM for the use of YODA graphics hardware and software. Correspondence concerning this article should be addressed to Jeremy M. Wolfe, Department of Brain and Cognitive Sciences, E10- 137, Massachusetts Institute of Technology, Cambridge, Massachu- setts 02139. Treisman's feature integration model (Treisman & Gelade, 1980 Treisman, 1986), perhaps the leading model of visual search, seeks to explain the distinction between serial and parallel searches with a two-stage model. A fairly limited, "preattentive" (Neisser, 1967), parallel stage of processing is followed by a more sophisticated, serial stage. Treisman holds that only basic features such as color, size, and orientation can support parallel search, whereas all other stimuli require a serial search. In particular, she has argued that serial search is required for targets defined by conjunctionsof basic features (e.g., a red X among green Xs and red 0s). Treisman has presented an extensiveset of experiments showingfairly steep, linear RT x Set Size functions with 2:1 ratios between slopes for target and blank trials (Treisman & Gelade, 1980 Treis- man & Paterson, 1984). Julesz's texton model (Julesz, 1981 Julesz, 1984 Julesz & Bergen, 1983) shares many important features with the feature integration model. It is a curious feature of these models that the parallel processes seem to have very little influence on the subsequent serial processes. In the standard feature integration model, the parallel processes can identify targets on the basis of a single feature. However, if they do not find a target, none of ' The terms serial and parallel must be used with some care. Townsend (1971,1976) has shown that results such as those described above do not by themselves prove that an underlying search process is serial or parallel. For example, it may be that all searches are parallel and that the differences between searches lie in capacity limits on different parallel processes: a very large capacity for processes involved in identifyingcolor, and more limited capacity for processes involved in more complex identifications (e.g., T vs. L). Whether the underlying distinction is between serial and parallel processes or between capacity limited and unlimited processes, there remains an interesting, qualitative difference between effortless and effortful searches. For the sake of convenience, we will use serial to refer to searches that produce RT X Set Size functions with a substantial positive slope, and parallel to refer to functions with little or no slope. We recognize that these are labels and not absolute commitments to a particular view of the nature of the underlying mechanisms. 419
420 J. WOLFE, K. CAVE, AND S. FRANZEL the information that they have gathered is used by the serial processes, even if that information might be useful. Consider a search for a red X among green Xs and red 0s. The target is defined by a conjunction therefore, it cannot be located by the parallel processes. Nevertheless, a parallel process for color can differentiate between green and red items. Because no green item can possibly be a red X, it would seem sensible for the parallel process to inform the serial process ofthe locations of all green items so that the serial process would not waste time and effort examining those items. Indeed, there is ample evidence that information from parallel processing of color can be used to restrict serial searches to items of a single color in a multicolored array (Bundesen & Pedersen, 1983 Egeth, Virzi, & Garbart, 1984 Farmer & Taylor, 1980 Green & Anderson, 1956 Smith, 1962). In this article we present data from a series of visual search experiments suggesting that serial visual search can be guided by information from any of a number of parallel processes. In the first series of experiments, unpracticed subjects searched for conjunctions of color and form or color and orientation. In general, the slopes of the RT x Set Size functions are quite shallow. For many subjects, the slopes are virtually flat, or at least as "flat" as published data for feature searches. There have been several other published reports of shallow slopes for conjunction searches. For example, Nakayama and Silverman (1986) found that searches for a number of con- junctions involving motion, stereoscopic depth, or both can produce very shallow RT x Set Size slopes (see also Steinman, 1987, and McLeod, Driver, & Crisp, 1988). It has been possible to regard each of these previous cases as an exception to the general feature integration rule that conjunctions re- quire serial search. For example, Nakayama and Silverman's data suggest that depth may behave in special ways as a feature. However, the results presented in this article will show that very shallow slopes can be obtained using the same classes of conjunctions (e.g., Color x Form) used by Treisman (Treisman & Gelade, 1980) in formulating the feature inte- gration model. This raises the question of why our results differ from previously published results for similar conjunction searches. The second set of experiments addresses that issue. The results of Experiment 7 will show that the difference is largely attrib- utable to differences in the stimuli. It is not due to learning (Experiments 5 and 6) or to a general ability of our subjects to do all searches in parallel (Experiment4). Regardless of the explanation of the differences between our data and previously published results, it is important to realize that the simple existence of our data requires a modification of the feature integration model. That model holds that conjunctions (at least of color and form) require serial search. Our data show that in some cases this is not so. In our modification of the feature integration model, we propose that the parallel processes guide the "spotlight of attention" toward likely targets. Thus, we call it "guided search." This is not an entirely new idea. Hoffman (1978, 1979) proposed a two-stage model in which a parallel first stage delivers likely targets to a slower, serial, second stage. Although the basic architecture of our proposal is similar to that of Hoffman's, his model is based on work from a some- what different search task and does not deal explicitly with searches for conjunctions. An advantage of the guided search model is its ability to explain many ofthe previously published problematical results as examples of, and not exceptions to, the general feature integration rule. The guided search model makes testable predictions. One such prediction is that triple conjunctions (Quinlan & Hum- phreys, 1987) should be easier to find than standard conjunc- tions. If the parallel processes can guide subsequent serial search, then three parallel sources of guidance ought to be better than two. The standard feature integration modelwould predict serial search for such stimuli. In the third set of experiments, subjects searched for a triple conjunction of color, form, and size. These searchesproduce shallower slopes than simple Color x Form conjunctions. Indeed, in one condition, search for triple conjunctions is independent of set size. To summarize, this article makes three main points: 1. In our experiments, results from naive, unpracticed sub- jects searching for conjunctions of Color x Form, Color x Orientation, and Color x Size are inconsistent with serial, self-terminating search. 2. Searches for triple conjunctions are easier than for sim- ple conjunctions, a fact not predicted by the standard feature integration model. 3. A modification of that model to allow the parallel proc- esses to guide serial search can explain these and other prob- lematical results. Experiment 1: Conjunctions of Color and Form Method Subjects. Twenty subjects weretested. They weredrawn from the Massachusetts Institute of Technology (MIT) undergraduate subject pool and were paid for their participation. All wore their best optical correction, if they required any. All were naive as to the purposes and method of the experiment. None had been subjects in previous visual search experiments. Apparatus and procedure. Stimuli were presented on a standard television monitor that was part of a modified "Sub-Roc 3-D" video game. Displays were controlled by an IBM PC-XT with IBM-YODA graphics. Stimuli were saturated red and green Xs and Os on a black background. (CIE, International Commission on Color, x,y coordi- nates: red, .62, .36 green, .34, .57). Subjects viewed an 11.3�� by 11.3�� field with a small central fixation point. Individual items fit within an 0.85�� by 0.85�� square. They could be placed at any of 36 locations in a slightly irregular 6 by 6 array. On each trial, items were presented at 8, 16, or 32 randomly chosen loci within the array. On target trials, one of these loci contained a target item. Set size, positions of target and distractors, and presence or absence of a target were random across trials. Subjects responded by pressing one of two keys: A yes key if a target was detected and a no key if it was not. Reaction times were measured from stimulus onset. The stimulus remained visible until the subject responded and feedback was given on each trial. Targets were present on 50% of trials. All experiments in this article were variations of this visual search paradigm. In Experiment 1, each subject was run in one session of 260 trials. For the first 40 trials, subjects did a very simple search in order to
422 J. WOLFE, K. CAVE, AND S. FRANZEL Experiment 2: Conjunctions of Color and Orientation The results from one experiment present an insufficient case for calling for a modification of a successful model. To bolster the case for a change in the feature integration model, we repeated the experiment using a slightly different conjunc- tion (color and orientation) and different ranges of set sizes. Method In this case, subjects searched for a green, horizontal line among red horizontals and green verticals. A total of 22 subjects were tested. Some of these subjects had been in previous visual search experi- ments. Two of the authors served as subjects the other 20 subjects were drawn from the MIT undergraduate subject pool. Subjects were divided into three groups, each of which was tested on a different group of set sizes: Group A, 3, 6, 9, 18, and 36 Group B, 1,2, 4, 8, 16, and 32 Group C, 1,2, 6, 12, and 24. We used more than three set sizes per subject in order to better examine the linearityof the Set Size x RT function. Subjects received 20 practice trials and 100 experimental trials per set size. (Groups A and C received 520 trials and Group B received 620.) In all other respects the methods were identical to those of the previous experiment. Results Figures 1A and IB show average RTs for target and blank trials for each group of subjects. In general, the results replicate those from Experiment 1. In all three versions of the experi- ment, RT shows a strong linear trend upward as set size increased, both for target trials [Group A, F(l, 35) = 65.3 Group B, F(l, 35) = 182.6 Group C, F(l, 20) = 62.3 p .001 in all cases] and for blank trials [Group A, F(l, 35) = 19.7 Group B, F(l, 35) = 99.9 Group C, F(l, 20) = 54.5 p .001 in all cases]. Also for all three versions, negative slopes are steeper than positive slopes [Group A, F(\, 35) = 11.5 Group B, F(\, 35) = 40.2 Group C, F(\, 20) = 10.3 p .005 in all cases]. As in Experiment 1, the RT x Set Size functions are substantially shallower than would be predicted if subjects were undertaking a simple, serial self-terminating search in which attention moved every 40 ms. Average slopes for the three groups are given in Table 2. Shallow as they are, the slopes for target trials are mislead- ingly high because, in this experiment, the target trial func- tions are not linear. In at least two of the three groups, the target slope decreases for larger set sizes, resulting in signifi- cant quadratic trends, Group B, F(l, 35) = 20.1, p .001 Group C, F(l, 20) = 6.6, p .025. The quadratic trend is not present in Group A (F 1), perhaps because the smaller set sizes were not included in that version or perhaps because the slopes for this set of subjects are so shallow that any nonlinearity is hidden. Given the nonlinear component in the target trial data, wedid not attempt to test the hypothesis that the blank and target trial slopes were in a 2:1 ratio. This negative acceleration of the RT function is apparent in Figure 1A. Table 3 shows slopes computed separately for the three lowest set sizesand the three highest set sizes. Clearly, slopes are steeper for smaller set sizes. For the higher set sizes, the slopes are very shallow, averaging 3-6 ms/item. 1 A: Target Trials o 0) (0 UJ o o UJ QC 1000i 900 800 700- 600- 500- 400 T: Green horizontal D: Red horizontal Green vertical B 5 10 15 20 25 30 35 SET SIZE 1B: Blank Trials 1100i V (0 1000 - E, UJ 900- 2 *"" 800- 700- o LLJ 600- QC 500 T: Green horizontal D: Red horizontal Green vertical 10 15 20 25 30 35 SET SIZE Figure 1. Average RTs as a function of set size in a search for a conjunction of color and orientation. (There are three groups of subjects [A, B, and C]. Target items [T] and distractor items [D] are identified on the figure. 1A shows target trial data only IB shows blank trials. Note that RT increases quite slowly with set size and that functions for Groups B and C appear nonlinear. See Table 2 for slopes of these functions.) For Group A and Set Sizes 3, 6, 9, 18, and 36, error rates are 4.8%, 4.6%, 6.2%, 7.8%, and 8.9%, respectively. For Group B and Set Sizes 1,2, 4, 8, 16, and 32, error rates are 1.4%, 2.8%, 1.0%, 1.2%, 0.4%, and 0.8%, respectively. For Group C and Set Sizes 1, 2, 6, 12, and 24, error rates are 5.2%, 7.5%, 7.0%, 6.5%, and 10.2%, respectively. There is some evidence of a speed-accuracy trade-off, in that Version B has the longest RTs, the steepest slopes, and the lowest error rates. An ANOVA shows that effects of set size on errors are insignificant for all three versions (all ps . I).