Visual search and stimulus simila...
Psychological Review 1989, Vol. 96, No. 3,433-458 Copyright 1989 by the American Psychological Association, Inc. 0033-295X/89/$00.75 Visual Searchand Stimulus Similarity JohnDuncan MRC Applied Psychology Unit Cambridge, England Glyn W.Humphreys Birkbeck College, Universityof London London, England A new theory of search and visual attention is presented. Results support neither a distinction be- tween serial and parallel search nor between search for features and conjunctions. For all search materials, instead, difficulty increases with increased similarity of targets to nontargets and de- creasedsimilarity between nontargets, producing a continuum of search efficiency. A parallel stage of perceptual grouping and description is followed by competitive interaction between inputs, guid- ing selective access to awareness and action. An input gains weight to the extent that it matches an internal description of that information needed in current behavior (hence the effect of target- nontarget similarity). Perceptual grouping encourages input weights to change together (allowing "spreading suppression" of similar nontargets). The theory accounts for harmful effects of nontar- gets resembling any possible target, the importance of local nontarget grouping, and many other findings. The Efficiency of Visual Selection It is common knowledge that we can pay attention (at any one time) to only a small amount of the information present in a visual scene. Experimentally, it is easy to confirm that people can take up and report only a small amount of the information contained in a brief visual display (Helmholtz, cited in Warren & Warren, 1968). Such a limitation imposes a strong require- ment for selection: Ideally, we should confine attention to that information needed to guide current behavior, and again it is easy to confirm that people can use many different selection criteria (location, color, movement, etc.) to choose which infor- mation to see in a briefly glimpsed scene (e.g., Helmholtz, cited in Warren & Warren, 1968 von Wright, 1970). This article deals with the efficiency of selection. In visual search experiments, subjects are asked to detect par- ticular target stimuli presented among irrelevant nontargets. Results depend on the combination of targets and nontargets used. With some combinations, the number of nontargets in a display has little if any effect. Obviously, they are rejected with- out access to those rate-limiting stages of processing responsible for our limited ability to payattention to several stimuli at once. The experience is that attention is drawn directly to the target, implying an efficient prior rejection of nontargets (Duncan, Financial support was provided by the MRC Applied Psychology Unit, where the work was carried out. The second author's research is also supported bygrants from the Medical Research Council, Economic and Social Research Council, and Science and Engineering Research Council. We are grateful to Claus Bundesen, Howard Egeth, Harold Pashler, Richard Shiffrin, and AnneTreisman, all of whom commented exten- sively on earlier drafts. Correspondence concerning this article should be addressed to John Duncan, MRC AppliedPsychologyUnit, 15 Chaucer Road, Cambridge CB2 2EF,England. 1980b, 1985 Hoffman, 1978 Shiffrin & Schneider, 1977). In other cases, increasing the number of nontargets substantially increases the time taken to find the target. The experience is that we must pay attention to several nontargets in turn before the target is "found," implying that the efficiency of nontarget rejection is reduced. Here, we seek to understand selection in general by investigating boundary conditions on efficient non- target rejection in visual search. Feature Integration Theory Our point of departure is Treisman's feature integration the- ory (Treisman & Gelade, 1980 Treisman & Souther, 1985).Ac- cording to this theory, input from a visual display is processed in two successive stages. The first stage consistsof a set of spatio- topically organized "maps" of the visual field, each coding the presence of a particular, elementary stimulus attribute or "fea- ture." Thus, one map might code where redness occurs, one where 45��-tilted lines occur, and so on. This stage works in par- allel across the visual field but is limited in that it produces no useful information about the conjunction of elementary fea- tures. Thus, activity in separate maps might showthat thefield contains redness, greenness, a diagonal line, and a closed loop, but it cannot showthat the line isred and the loop green. Useful conjunction information only becomes available with process- ing at the second stage. Attention is focused on a particular area of the field. Outputs from those maps with activity in this par- ticular area are then combined to produce the percept of a whole object (e.g., a green loop). If features are to be accurately conjoined, attention must be focused serially on one object after another. It is this serial process that is responsible for our lim- ited ability to seea whole scene at a glance. The difficulty of visual search is thus determined bywhether a target is unique in some elementary feature or only in its con- junction of features. Asan example of feature search, the target might be a blue shape presented among a mixture of reds and 433
434 JOHN DUNCAN AND GLYN W. HUMPHREYS greens. Net activity in the blueness map is sufficient to show whether a target is present, and because activity in this map develops in parallel across the visual field, there should be little effect of the number of items present. In conjunction search, on the other hand, the target might be a red O presented among mixed blue Os and red Xs. Because display items can only be classified astargets or nontargetswith focused attention, the tar- get must be found by scanning serially through the display, and the number of items will have a large effect. Although later we consider some exceptions, results supporting this distinction between feature and conjunction search have now been re- ported many times, using color, form, size and other stimulus attributes (e.g., Treisman, 1982 Treisman & Gelade, 1980 Treisman, Sykes, & Gelade, 1977). Recently, the theory has been modified to allow the possibil- ity that feature search can be serial whentargets and nontargets are closely similar (Treisman & Gormican, 1988). Suppose that targets and nontargetsdiffer slightly in color. Then, each nontar- get might have some tendency to excite the target map, making it hard to decide whether this map contains enough net activity to indicate that a target is really present. The more nontargets are present, furthermore, the smaller will be the proportional increase in activity produced by a target, and the harder will be the decision. When this happens, Treisman and Gormican (1988) suggest that attention is focused serially on one clump of items after another. The size of the clump ischosen such that, within one clump, net activity in the target map will reliably indicate whethera target ispresent. The more discriminable the targets and nontargets, the larger can be the clumps. The origi- nal version of the theory then emerges as a special case. With high enough discriminability, the whole display can be treated as a singleclump. Feature integration theory is consistent with a range of psy- chological phenomena beyond visual search (Treisman & Ge- lade, 1980 Treisman & Schmidt, 1982 Treisman & Souther, 1985). It is supported by physiological evidence for early analy- sis of different stimulus attributes in different brain areas (Maunsell & Newsome, 1987). Results of work with connec- tionist models of vision also suggest that serial processing may be a good solution to the problem of correctly integrating an object's different attributes (Feldman, 1985). Overview Like feature integration theory, the present work deals with how search efficiency is determined by the nature of relevant (target) and irrelevant (nontarget) stimulus materials. Although other variables such as practice are important in search (Schneider & Shiffrin, 1977), stimulus factorsare our main con- cern here. We begin with an assessment of feature integration theory��� in particular, its account of letter search. A series of four experi- ments shows very large variations in search efficiency across stimulus materials, variations that are inconsistent withfeature integration theory whatever the postulate concerning elemen- tary featuresof simple shapes. Wethen present a new account, different from feature integration theory in several important respects. First, the dichotomy between serial and parallel search has no real place in our account, whichisbased on a continuum of search efficiency. Second, our approach is based not on a distinction between different stimulus attributes, but more ab- stractly on stimulus relations (similarities) that in principle can be specified for any attribute. Thus, we argue that very similar stimulus principles control search difficulty whatever the search materials, from simple color patches to complex feature con- junctions. In particular, search efficiency decreases with (a) in- creasing similarity between targets and nontargets (which we call T-N similarity), and (b) decreasing similarity between non- targets themselves (N-N similarity), the two interacting to scale one another's effects. We try to show that these principles are consistent both with the body of the search literature and with the apparent contrast between feature and conjunction search itself. Wego on to develop a theory of how, in search and other tasks, attention is directed to behaviorally relevant information in the visual field. This theory deals with similarities between possible targets and nontargets in search, with local effects of similarity within a display,and with a variety of other findings holding across a range of different search materials. Feature Integration Theoryand Letter Search Prior Evidence Feature integration theory has been applied to letter search by considering the conjunction of a shape's parts. There have been many visual search experiments using simple shapes such as letters and digits. In some tasks there is very little effect of the number of nontargets in a display���for example, search for a C among 4s (Egeth, Jonides, & Wall, 1972) or for a T or F among Os (Shiffrin & Gardner, 1972)���whereas in other tasks the effect is substantial (Kleiss & Lane, 1986). Does the difference depend on whethertargets possess some unique (shape)feature? Despite some positivefindings,the literature as a wholeis rather puzzling. In fact, feature integration theory has been applied to letter search in two ways. The first, called by Duncan (1987) the case of within-object conjunctions, deals with the spatial arrange- ment of strokes within a letter. According to several accounts, the elementary features of letters include lines of particular length and orientation, intersections (line crossings), line termi- nators, and a few other features (e.g., Bergen & Julesz, 1983 Treisman & Paterson, 1984 Treisman & Souther, 1985). Sup- pose then that two letters share exactly the same features, differing only in their spatial arrangement. Obvious candidates are pairs like L and T, which contain different arrangements of the same strokes. Individual feature maps will not be able to separate these letters only when the outputs of different maps are put together with serial attention will the distinction be made. Correspondingly, Beck and Ambler (1973) reported a large effect of display size in search for an L among nontarget Ts, contrasting with a much smaller effect when the target (a tilted T) had strokes of a unique orientation. Similar results were reported by Bergen and Julesz (1983), contrasting search for a T or a + (with its unique intersection) among nontar- get Ls. The second case concerns across-abject conjunctions. Here,
SIMILARITY IN SEARCH 435 the target can be formed by recombining strokes from different nontargets (e.g., search for R among Ps and Qs) again, the tar- get is unique only in its conjunction of strokes. Treisman and Gelade (1980) and Duncan (1979) found large effects ofdisplay size in such tasks. If the target had a unique stroke (e.g., R among Ps and Bs), on the other hand, the effect of display size was rather smaller. Of course, the interpretation of such results in terms of fea- ture integration theory depends on assumptions concerning what elementary letter features are coded at the first, parallel processing stage. To deal with within-object conjunctions, for example, the theory must assume that the position of strokes within a letter is not coded. Weshall consider such issues later. For the moment, we may refer to these stimuli as stroke con- junctions rather than feature conjunctions. Other results complicate the picture. Consider first the case of within-object conjunctions. Humphreys, Riddoch, and Quinlan (1985) studied search for an inverted T amongupright Ts. Despite the resemblance of this to the within-object con- junction tasks of Beck and Ambler (1973) and Bergen and Julesz (1983), there was little effect of display size, search times increasing by only 3-ms/item when arrays had a regular spatial arrangement. What can we sayabout these apparently conflict- ing results? A first point to note is the difficulty of comparing effects across experiments. For various reasons, even unlimited- capacity parallel models predict some drop in performance with increasing display size (Duncan, 1980a Eriksen & Spen- cer, 1969). In reaction time (RT) studies, effects up to 5- or 6- ms/item are comparable with those usually given by feature search (Treisman & Souther, 1985), at least when the target is present. Effects as great as 20- to 30-ms/item are typical of con- junction search. Beck and Ambler(1973) and Bergenand Julesz (1983), however, measured accuracy rather than RT in studies with limited exposure duration. Little is knownabout the scale of display size effects in such experiments. Second,Humphreys, Quinlan, and Riddoch (in press) showed that a crucial variable in these studies is letter size or, more accurately, the ratio of size to retinal eccentricity. Using the same task as before, they obtained display size effects of 14- and 2-ms/item, respectively, for size/eccentricity ratios of 1/6 and 1/3. These ratios may be compared with about 1/8 for Beck and Ambler (1973) and up to 1/9 for Bergen and Julesz (1983). It seems likely that the re- sults of these authors' investigations were in part dependent on their use of relatively small letters. Questions may also be raised over Treisman and Gelade's (1980) study of across-object conjunctions. The smallest effects of display size (4- and 7-ms/item, respectively,for target-present and target-absent displays)were obtained when nontargets were homogeneous (e.g., search for R among Ps). With heteroge- neous nontargets, the effect was always much bigger, whether the target had a unique stroke (12- and 37-ms/item) or not (23- and 46-ms/item).1 A study of similar tasks by Kleiss and Lane (1986) is also instructive. Only heterogeneous nontargetswere used. FollowingShiffrin and Gardner (1972), Kleiss and Lane (1986) measured the accuracy of target detection in displaysof constant size, presented either all at once or two at a time. The technique is useful because unlimited-capacity parallel models predict no effect of presentation mode. In fact, there wasalarge advantage for presentation two at a time in both feature and conjunction tasks. Altogether, then, there are several aspects of letter search data that feature integration theory does not explain. One important variable is letter size. With large enough letters, there can be little effect of display size even when the target is unique only in its within-object conjunction of strokes. A second important variable is nontarget homogeneity.With heterogeneous nontar- gets, there can be large effects of display size���and large depar- tures from unlimited-capacity parallel search���even if the tar- get has a unique stroke. The four experiments that follow de- velop these puzzles for feature integration theory, and taken together, they show that the theory cannot explain the largevari- ations in search efficiency seen across different letter search tasks. Experiment 1 Experiment 1 was designed to investigate effects of letter size and nontarget homogeneity on both feature and conjunction search. Using an RT task, we replicated Beck and Ambler's (1973) comparison between search for Ls and tilted Tsamong nontarget Tsthat were either upright or rotated 90�� clockwise. We used two extremes of letter size (size/eccentricity ratios of 1/12 and 1/3) and nontargets that were either homogeneous (upright in one block of trials, sideways in another) or heteroge- neous (both upright and sidewaysmixed in each display). Method Tasks. Experiment 1 was run on-line on a Cambridge Electronic Design laboratory computer system, controlling a Hewlett-Packard X- Y display (1332A) with P24 phosphor. Displayswere viewed from a chin rest, at a distance of about 65 cm. On each trial, the subject fixated a dot in the center of the screen, pressed a foot switch, and sawan immedi- ate 180-ms display of 2,4, or 6 letters. The response wasto be made, as quickly as possible, by pressing a key with the right hand if a specified target letter waspresent or with the left hand if it wasabsent. An interval of 1,000 ms preceded onset of the fixation point for the nexttrial. With the modification noted later, letters appeared on the perimeter of an imaginary circle of radius 2�� 24', centered on fixation. Starting at 12 o'clock, there were eight possible letter positions, evenly spaced round the circle. A (randomly selected) arc of adjacent positions was used for each display, equating the distance between adjacent characters across display sizes. Three factors varied between blocks. The target waseither an upright L or a T tilted 45��clockwise. Nontargets were either homogeneous���in which case they were either all upright Ts or all Ts rotated 90�� clock- wise���or heterogeneous���in which case each display contained, as nearly as possible, an equal number of Ts in these two orientations, randomly arranged. The two strokes of each letter were equal in length. They measured either 12' arc or 48' arc, giving letter size/eccentricity ratios of l/12or 1/3. A possible difficulty with regular nontarget displays isthat supraletter cues might show whether a target ispresent. Consider the case of search for an L among Ts. Suppose, for example, that the horizontal lines of 1 Values have been estimated from Treisman and Gelade's (1980) Fig- ure 6. Estimatesare based only on the comparison of display sizes 1 and 15, the values available for all conditions.
436 JOHN DUNCAN AND GLYN W. HUMPHREYS Table 1 Experiment 1: Reaction Times (in Milliseconds) as a Function of Display Size Small letters (1/12) Condition Target = L Homogeneous Present Absent Heterogeneous Present Absent Target = tilted T Homogeneous Present Absent Heterogeneous Present Absent 2 432 448 420 448 446 470 492 530 4 432 460 416 452 461 478 505 530 6 444 464 430 475 470 490 523 547 Slope (ms/item) 3 4 3 7 6 5 8 4 2 388 413 383 414 400 404 424 444 Large letters (1/3) 4 383 413 395 425 392 409 427 464 6 402 410 397 416 398 404 445 450 Slope (ms/item) 4 -1 4 1 -1 0 5 2 all letters could begrouped together, and the shape ofthe resulting group could be determined. In a homogeneous display with letters arranged around the perimeter ofa circle, this shape wouldbe a smooth arcwhen the target wasabsent, but distorted whenthe target waspresent. To elim- inate such cues, each nontarget was shifted slightly so that one of its strokes, horizontal or vertical (randomly selected), fell in the position that the corresponding stroke of an L would occupy in the same display location. The result was a display of rather irregular appearance, in which, when the target was an L, any target stroke fell in a position possible for a nontarget stroke. Design. Each subject served in six sessions of about 1 hr each, on different days. Typeof nontargets, homogeneous or heterogeneous, was fixedfor any one session and alternated between sessions, withthe order counterbalanced across subjects. Each session wasdivided into four blocks, one for each combination of letter size and target. There were alwaystwo blocks at one letter size followed by two at the other, with the same order of targets in each pair. With these constraints, the order of blocks wascounterbalanced across subjects, although fixed for any one. Each block wasfurther divided into two sub-blocks, each of 24 practice followed by 96 experimental trials. When nontargets were heterogeneous, the two sub-blocks were identi- cal, but when nontargets werehomogeneous, one sub-block wasdevoted to the upright and one to the sideways T. Within each experimental run of 96 trials, there wereequal numbers of trials with and without a target at each display size. Otherwise, the order of trials was random, as wasthe arc of letter positions chosen for each display and the position of the target (if present) in this arc. At the end of every 24 practice and 96 experimental trials, the subject was shown mean reaction time and error rate for the run. Subjects. All of the experiments in this series used subjects from the paid panel of the Applied Psychology Unit. Here, they were 4 women, between 28 and 35 years of age. Results and Discussion Table 1 showsmean RTs in each condition, as well as slopes of best-fitting linear functions relatingRT to displaysize. Data are from experimentaltrials on the last 2 daysofpractice. Trials with RTs greaterthan 1,500 ms have beenexcluded. Therewere four important results. First, slopeswereall in the range normally taken to suggest parallel search (Treisman & Souther, 1985), witha maximumof 6-ms/item (averaged across present and absent trials). Second, slopes were very similar for the two targets, unlike the results of Beck and Ambler(1973). Third, slopes were slightly greater with small than with large letters, although the effect was much smaller than the one re- ported by Humphreyset al. (in press). Fourth, slopes were little affected by nontarget homogeneity, although homogeneous nontargetsgave slightlyquickerrespondingoverall. Analysis of variance (ANOVA) confirmed these conclusions. There was a significant main effect of display size, F(2, 6) = 27.8, p .001, whichinteracted with letter size, F(2, 6) = 9.5, p .02, but not target type, F(2,6) = 0.2, or nontargethomoge- neity, F(2,6) = 2.4. There wasalsoa significant but small three- way interaction betweendisplaysize,letter size, and target pres- ence, F(2, 6) = 7.0, p .05, which we neglect. Finally, there were significant main effects of nontarget homogeneity, P(l, 3) = 10.9,p .05, and letter size, F( 1, 3) = 91.5,p .005, and a four-way interaction, probably spurious, between nontarget Table 2 Experiment 1: Error Proportions Small letters (I/ 12) Condition Target = L Homogeneous Present Absent Heterogeneous Present Absent Target = tilted T Homogeneous Present Absent Heterogeneous Present Absent 2 .023 .031 .031 .024 .039 .016 .039 .031 4 .047 .016 .047 .024 .047 .024 .125 .032 6 .086 .039 .094 .016 .094 .055 .149 .063 Large letters (1/3) 2 .008 .008 .024 .039 .023 .016 .024 .016 4 .031 .008 .000 .008 .023 .024 .024 .047 6 .078 .016 .024 .016 .023 .000 .031 .000
SIMILARITY IN SEARCH 437 Table 3 Experiment 2a: Reaction Times (in Milliseconds) as a Function of Display Size Small letters (1/6) Large letters (1/3) Condition/session Slope (ms/item) Slope (ms/item) Present 1 2 3 M Absent 1 2 3 M 523 565 556 488 509 519 487 505 513 500 526 529 537 515 496 540 560 513 514 497 497 Target = L 516 517 524 499 468 521 485 483 513 491 499 502 492 483 512 483 478 490 477 484 491 489 476 485 483 479 489 Present 1 2 3 M Absent 1 2 3 M 500 531 539 504 508 525 470 491 497 491 510 520 495 499 511 511 507 519 476 488 477 494 498 502 Target = tilted T 10 5 7 458 455 463 1 457 464 461 1 446 453 463 4 454 457 462 2 473 463 477 1 465 474 458 -2 467 464 443 -6 468 467 459 -2 homogeneity, target type, letter size, and target presence, F\\, 3) = 26.0, p .02, for which MS, was more than 10 times smaller than any other in the analysis. Error data appear in Table 2. They suggest only one modifi- cation to our conclusions. When letters were small, error rates increased with increasing display size. This suggests that RT re- sults may underestimate the true interaction between display size and letter size. Experiment 2 Experiment 1 left us with two questions. First, weconfirmed the finding of Humphreys et al. (in press) that smaller letters produce a greater effect of display size. Perhaps because of the brief exposure of stimulus displays, however, the result was re- flected partly in error rates rather than RTs. Experiment 2 ex- amined the effect further, using displays that remained visible until the response. This also allowed an examination of RT effects early in practice���data wehavenot presented for Experi- ment 1 because high error rates with small letters made RTdata uninterpretable. Experiment 1 also showed no effect of target type, in disagree- ment with the results of Beck and Ambler (1973). Instead, the effect of display size for both types of target depended on letter size. These results are also reexamined in Experiment 2. Method Tasks. There werethe following changes from Experiment 1. Nontar- gets were alwayshomogeneous upright Ts. Letter sizes were 12' and 24' eccentricity (with the same small jitter as before) was 1��12' in Experi- ment 2a, givingsize/eccentricity ratios of 1/6 and 1/3, and 2��24' in Ex- periment 2b, giving ratios of 1/12 and 1/6. Displays remained visible until the response. A right-hand response wasrequired when all display items werethe same (target absent), a left-hand response when one item wasdifferent (target present). Design. Each subject served in three similar sessions. The four blocks of each session, one for each combination of letter size and target, were counterbalanced as before. Each block had a run of 24 practice trials and then two runs each of 72 experimental trials. Subjects. Each of Experiments 2a and 2b had 4 subjects, between 19 and 32 years of age. There were 7 women and 1 man. Results Reaction times from Experiment 2a are shown in Table 3. The table shows separate results for the three sessions of prac- tice, as well as the mean across sessions. As before, all of the slopes were in the range normally taken to suggest parallel search. An ANOVA showed a significant effect of display size, F(2, 6) = 10.4, p .02, which interacted with session, F(4, 12) = 3.4, p .05, but not with target type, F(2, 6) = 1.4, or letter size, F(2,6) = 3.4. The only other significant effects were