Quantifying the Time Course of Vi...
mostly from studies using uncontrolled images (Itier and Taylor, 2004). Low-level image properties, such as luminance, contrast, and amplitude spectrum can be manipulated using published toolboxes (Knebel et al., 2008 Willenbockel et al., 2010). Low-level image properties should be equated across image categories in studies seeking to reveal high-level differences (Rousselet et al., 2005, 2008a Honey et al., 2008 Gaspar and Rousselet, 2009). Alternatively, low-level properties can be manipulated parametrically to reveal the influence of low-level factors, such as contrast, on high-level cognition (Macé et al., 2005). Low-level and high-level properties can also be explicitly modeled together (Rousselet et al., 2008b). For other research questions, it might be essential to control local information as well, for instance in studies of facial expression processing (Schyns et al., 2007). Although images cannot be com- pletely equated without loosing meaning, and there is no optimal procedure to control stimuli, the problem can no longer be ignored. Use a consistent framework to interpret task effects Instead of controlling physical differences among images, an alter- native strategy consists in measuring ERP modulations due to task differences while keeping stimuli constant (VanRullen and Thorpe, 2001 Rousselet et al., 2007). More generally, task manipulations are essential to understand the nature of ERP differences, one of the most enduring debates in the field (Carmel and Bentin, 2002 Rossion et al., 2002). However, the interpretation of task effects and their comparison across studies is complicated by the use of inconsistent terms: for instance, the N170 and the M170 to faces have been described as sensitive, selective, or specific responses to faces (Carmel and Bentin, 2002 Liu et al., 2002 Itier and Taylor, 2004 Joyce and Rossion, 2005) the intracranial N200 has been Visual cognition depends on fast and progressive transforma- tions of retinal inputs into higher-order representations useful for decision-making (Rousselet et al., 2004a DiCarlo and Cox, 2007 Schyns et al., 2009a). Hence, a theory of visual cognition must specify the information content of brain activity from retinal input to decision-making, and the operations performed on this information – the mechanisms. This theory must also specify how information content and mechanisms develop during childhood and deteriorate with age. Because of the temporal resolution of EEG and MEG, ERP research is well suited to identify the cascade of processes that lead to decision-making (Schyns, 2010). ERP research has matured its techniques and theories since the first reports of larger ERPs to faces compared to objects. Progress has, however, been inhomogeneous: most recent ERP studies use outdated experimental designs and statistical techniques, and poor interpretation frameworks. The field shows its immaturity by its incapacity to make precise predic- tions about the timing and magnitude of expected effects: the fault of using group statistics and categorical designs, reporting effects as significant or not with no consideration for effect sizes, and a reluctance to model the results for future hypothesis testing. In sum, most ERP studies of visual cognition are plagued by problems that need to be addressed urgently. Use controlled stimUli The use of uncontrolled stimuli makes the interpretation of ERP differences among image categories difficult to interpret because it is unclear whether the effects are due to low-level, physical, differ- ences or high-level, semantic, differences (VanRullen and Thorpe, 2001). For instance, there have been speculations about the sen- sitivity of the P1 component to object categorical information, Quantifying the time course of visual object processing using ERPs: it’s time to up the game Guillaume A. Rousselet1* and Cyril R. Pernet 2 1 Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK 2 Brain Research Imaging Centre, SINAPSE Collaboration, University of Edinburgh, Edinburgh, UK Hundreds of studies have investigated the early ERPs to faces and objects using scalp and intracranial recordings. The vast majority of these studies have used uncontrolled stimuli, inappropriate designs, peak measurements, poor figures, and poor inferential and descriptive group statistics. These problems, together with a tendency to discuss any effect p 0.05 rather than to report effect sizes, have led to a research field very much qualitative in nature, despite its quantitative inspirations, and in which predictions do not go beyond condition A condition B. Here we describe the main limitations of face and object ERP research and suggest alternative strategies to move forward. The problems plague intracranial and surface ERP studies, but also studies using more advanced techniques – e.g., source space analyses and measurements of network dynamics, as well as many behavioral, fMRI, TMS, and LFP studies. In essence, it is time to stop amassing binary results and start using single-trial analyses to build models of visual perception. Keywords: faces, N170, robust statistics, single-trial analyses, mechanisms Edited by: Rufin Vanrullen, Centre de Recherche Cerveau et Cognition, France Reviewed by: Eugenio F. Rodriguez, Max Planck Institute for Brain Research, Germany Niko Busch, Charité – Universitätsmedizin Berlin, Germany *Correspondence: Guillaume A. Rousselet, Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, Glasgow, G12 8QB, UK. e-mail: guillaume.rousselet@glasgow. ac.uk www.frontiersin.org May 2011 | Volume 2 | Article 107 | 1 PersPective Article published: 23 May 2011 doi: 10.3389/fpsyg.2011.00107
described as a specific response from a face module (Puce et al., 1999). Clear operational definitions of specific, selective, and prefer- ential responses have been described, providing a useful framework to interpret task effects and compare them across studies (Pernet et al., 2007). A specific response is a brain response for which activ- ity is exclusively observed in the context of an interaction between a category (information) and a task (process) (Fodor, 2001). Concretely, if the N170 was face specific, one should observe the N170 only for face stimuli in a given task, and no evoked activity (no difference from baseline) for other stimuli and tasks. A selec- tive response is defined as a category by task interaction, in which the target condition is higher than the control conditions, which themselves are higher than baseline. For ERPs, that means that a stronger component should be observed for one category (e.g., faces) relative to others (e.g., cars) but only for a given task (e.g., categorization vs. discrimination). Finally, a preferential response is task-independent, such that brain responses are stronger for a given category compared to all the others. Preferential activity reflects some specialization for the considered category however, it does not capture the interaction with the task. The point is that the criterion for category selectivity utilized in most publications is not sufficient to ascribe functional specialization. Based on these definitions, most categorical ERP effects reported so far seem to be preferential responses, including the N170, the M170, and the N200. Use robUst statistics ERP researchers, similarly to most psychologists and neuroscien- tists, tend to have misguided understanding of basic statistical pro- cedures. The most important problem is that mean, variance, t-tests, ANOVAs, correlations, and linear regressions are not robust to deviations from the optimal distribution parameters they assume, which can lead to substantial errors in descriptive and inferential statistics (Wilcox, 2005). Although there is no one-size-fits-all pro- cedure, alternative techniques have been available for more than a decade and should no longer be ignored. Using mean and variance can lead to distorted data descrip- tion and poor statistical power. When data are skewed, or contain outliers, or both, the mean is a poor measure of central tendency and the variance a poor measure of dispersion. As a consequence, confidence intervals relying on mean and variance tend be too large, t-tests and ANOVAs tend to lack power, which means that null results from these tests are not convincing evidence of a lack of effect. Many robust alternatives to mean and variance exist, such as trimmed means and winsorized variance. Such robust measures of central tendency and dispersion behave appropriately under nor- mality and when normality assumptions are violated. In particular, Wilcox (2005) has shown that the 20% trimmed mean performs well in many situations. Robust estimators have been used to derive robust t-tests and ANOVAs, some of them relying on bootstrap procedures. These modern statistical techniques are available in the R environment (Wilcox, 2005) and several Matlab toolboxes (Maris and Oostenveld, 2007 Litvak et al., 2011 Pernet et al., 2011). Contrary to t-tests and ANOVAs, correlations and linear regres- sions tend to be biased toward false positives, which means that when a significant effect is found, its effect size might be artificially inflated and it remains unclear whether there is a true effect or whether the data suffer for instance from heteroscedasticity, i.e., variance inhomogeneity. Robust correlation and linear regression techniques are available in the R environment, for instance boot- strap tests under heteroscedasticity, skipped estimators and the percentage bend correlation (Wilcox, 2005). Another important problem in ERP research is the use of inef- fective multiple comparison corrections (MCCs). In ERP stud- ies focusing on peaks, it is important to control for the number of linear contrasts to maintain the false positive error rate at the nominal level. Bonferroni correction tends to be too conserva- tive but many other options exist, depending on the experimental design and the estimators tested (Wilcox, 2005). However, most of these MCCs, developed to deal with psychology data, are not appropriate for ERP studies in which tests are performed at many time points, electrodes or temporal frequencies. Indeed, ERP effects have temporal, spatial, and frequential correlations that need to be taken into account to provide efficient statistical tests. To take into account the temporal structure of ERP effects, a popular MCC consists in dismissing all effects that are significant for less than a certain number of time points, e.g., 15 consecutive significant t-tests (Rousselet et al., 2004b). This MCC and other ad hoc techniques should be abandoned because of poor control of false positive and false negative errors. Data driven approaches provide a better con- trol of the false positive error rate, without sacrificing power, by taking into account the correlations inherent to ERP data. These MCCs rely on permutation and bootstrap techniques and are avail- able in Matlab toolboxes (Maris and Oostenveld, 2007 Litvak et al., 2011 Pernet et al., 2011). Use optimized averaging In addition to non-robust statistics, low statistical power can result from the choice of electrodes entered into group analyses. In group analyses, ERPs are typically measured at the same electrodes in all subjects. However, these electrodes will not necessarily pick up func- tionally equivalent signals because even minor differences in brain fissuration or skull and scalp inhomogeneity can lead to different scalp projections. A potentially more fruitful way to do group sta- tistics is to optimize electrodes independently in each subject, for instance by selecting the electrodes most sensitive to image and task parameters (Foxe and Simpson, 2002). Hence, this kind of optimized averaging tends to average signals that reflect common processing across subjects, whereas using the same spatial electrodes may lead to averaging signals reflecting different processes. Statistical circular- ity can be avoided by selecting the electrodes using an independent dataset (Liu et al., 2002), or an orthogonal condition (Kriegeskorte et al., 2009). Moreover, there is no or minimal circularity when the selected electrodes correspond to electrodes extensively reported in the literature, and when they reveal large and reliable effects in highly expected time windows (Rousselet et al., 2010). Group averaging can also be optimized by using independent components (Delorme et al., 2007) or by projecting data in a common source space (Gross et al., 2007). In source space, different locations can be studied to reveal their information content over time (Smith et al., 2009). Equivalent independent components are more difficult to cluster, although progress has been made in this direction (Onton et al., 2005 Gramann et al., 2010). These techniques have the potential to help make more meaningful comparisons across subjects and to increase statistical power. Rousselet and Pernet Problems with object ERP research Frontiers in Psychology | Perception Science May 2011 | Volume 2 | Article 107 | 2