A Beginner's Guide to Partial Lea...
A Beginner���s Guide to Partial Least Squares Analysis Michael Haenlein Department of Department of Electronic Commerce Otto Beisheim Graduate School of Management Andreas M. Kaplan Department of Media Management University of Cologne Since the introduction of covariance-based structural equation modeling (SEM) by J��reskog in 1973, this technique has been received with considerable interest among empirical researchers. However, the predominance of LISREL, certainly the most well-known tool to perform this kind of analysis, has led to the fact that not all re- searchers are aware of alternative techniques for SEM, such as partial least squares (PLS) analysis. Therefore, the objective of this article is to provide an easily compre- hensible introduction to this technique, which is particularly suited to situations in which constructs are measured by a very large number of indicators and where maxi- mum likelihood covariance-based SEM tools reach their limit. Because this article is intended as a general introduction, it avoids mathematical details as far as possible and instead focuses on a presentation of PLS, which can be understood without an in-depth knowledge of SEM. partial least squares, structural equation modeling, PLS, LISREL, SEM First-generationtechniques,suchasregression-basedapproaches(e.g.,multiplere- gression analysis, discriminant analysis, logistic regression, analysis of variance) and factor or cluster analysis, belong to the core set of statistical instruments which canbeusedtoeitheridentifyorconfirmtheoreticalhypothesisbasedontheanalysis of empirical data. Many researchers in various disciplines have applied one of these UNDERSTANDING STATISTICS, 3(4), 283���297 Copyright �� 2004, Lawrence Erlbaum Associates, Inc. Requests for reprints should be sent to Michael Haenlein at e-mail: email@example.com
methodstogeneratefindingsthathavesignificantlyshapedthewayweseetheworld today,suchasSpearman���s(1904)workongeneralintelligenceforpsychology(fac- toranalysis),Hofstede���s(1983)publicationoncross-culturaldifferencesforsociol- ogy (factor and cluster analysis), and Altman���s (1968) article on forecasting corpo- rate bankruptcy for management research (discriminant analysis). However, a common factor for all these methods is that they share three limita- tions, namely, (a) the postulation of a simple model structure (at least in the case of regression-based approaches) (b) the assumption that all variables can be consid- ered as observable and (c) the conjecture that all variables are measured without error, which may limit their applicability in some research situations. Where the first assumption, the postulation of a simple model structure (i.e., one dependent and several independent variables) is concerned, Jacoby (1978) stated that ���we live in a complex, multivariate world [and that] studying the impact of one or two variables in isolation, would seem ��� relatively artificial and incon- sequential��� (p. 91). Although model building always implies omitting some aspect of reality (Shugan, 2002), this assumption of regression-based approaches may be too limiting for an analysis of more complex and more realistic situations. This be- comes, for example, especially obvious when one wants to investigate the poten- tial effect of mediating or moderating variables (for a detailed definition of these two terms, see Baron & Kenny, 1986) on the relationship between one or more de- pendent and independent variables, which may result in some dependent variables influencing other dependent variables. With respect to the second limitation, the assumption that all variables can be considered as observable, McDonald (1996) stressed that a variable can be called observable ���if and only if its value can be obtained by means of a real-world sam- pling experiment��� (p. 239). Therefore, any variable that does not correspond di- rectly to anything observable must be considered as unobservable (Dijkstra, 1983). This definition makes it obvious that only a handful of relevant variables, such as age and gender, can be considered as observable, whereas ���the effects and properties of molecules, processes, genes, viruses, and bacteria are usually ob- served only indirectly��� (S. Wold, 1993, p. 138) Regarding the conjecture of variables measured without error, one has to bear in mind that each observation of the real world is accompanied by a certain mea- surement error, which may comprise two parts (Bagozzi, Yi, & Philipps, 1991): (a) random error (e.g., caused by the order of items in a questionnaire or respondent fatigue Heeler & Ray, 1972) and (b) systematic error, such as method variance (i.e., variance attributable to the measurement method rather than the construct of interest Bagozzi et al., 1991). Because the observed score of an item is therefore always the sum of three parts, namely, the true score of the variable, random error, and systematic error (Churchill, 1979), first-generation techniques are, strictly speaking, only applicable when there is neither a systematic nor a random error component���a rare situation in reality. 284 HAENLEIN AND KAPLAN
To overcome these limitations of first-generation techniques, more and more authors started using structural equation modeling (SEM) as an alternative. Compared to regression-based approaches, which analyze only one layer of link- ages between independent and dependent variables at the same time, SEM, as a second-generation technique, allows the simultaneous modeling of relationships among multiple independent and dependent constructs (Gefen, Straub, & Boudreau, 2000). Therefore, one no longer differentiates between dependent and independent variables but distinguishes between the exogenous and endogenous latent variables, the former be- ing variables which are not explained by the postulated model (i.e. act always as inde- pendent variables) and the latter being variables that are explained by the relation- ships contained in the model. (Diamantopoulos, 1994, pp. 108) Additionally, SEM enables the researcher to construct unobservable variables measured by indicators (also called items, manifest variables, or observed mea- sures) as well as to explicitly model measurement error for the observed variables (Chin, 1998a), and hence it overcomes the limitations of first-generation tech- niques described earlier and consequently gives the researcher the flexibility to ���statistically test a priori substantive/theoretical and measurement assumptions against empirical data (i.e. confirmatory analysis)��� (Chin, 1998a, p. vii). In general, there are two approaches to estimating the parameters of an SEM, namely, the covariance-based approach and the variance-based (or compo- nents-based) approach. Covariance-based SEM, in particular, has received high prominence during the last few decades and, ���to many social science researchers, the covariance-based procedure is tautologically synonymous with the term SEM��� (Chin, 1998b, p. 295). Although there are several different tools that can be used to perform this kind of analysis, such as EQS, AMOS, SEPATH, and COSAN, the LISREL program developed by J��reskog in 1975 became the most popular one and, consequently, the term LISREL is sometimes used as a synonym for covariance-based SEM. The focus of this article is to give an introduction to the other side of the coin, variance-based SEM, and to present partial least squares (PLS) analysis as one technique from this group in more detail. In contrast to articles already published in this area (e.g., Cassel, Hackl, & Westlund, 1999 Dijkstra, 1983 Garthwaite, 1994), our focus is on an easily understandable presentation of this topic, accessi- ble to beginners without extensive knowledge of statistics in general or SEM in particular. Additionally, we try to answer the question under which circumstances a researcher might want to prefer variance-based over covariance-based SEM, given the specific assumptions and limitations of each of these methods. For this purpose, our article is structured as follows: In the next section, we pro- vide a short introduction to theories, SEM, and the measurement of unobservable BEGINNER���S GUIDE TO PARTIAL LEAST SQUARES ANALYSIS 285