Simple Linear Regression

Badi H. Baltagi

Book Chapter

Simple Linear Regression

Baltagi B

Springer Berlin Heidelberg, (2011), 49-72

DOI: 10.1007/978-3-642-20059-5_3

N/ACitations

7Readers

Get full text

Abstract

In this chapter, we study extensively the estimation of a linear relationship between two variables , Y i and X i , of the form: Y i = α + βX i + u i i = 1, 2,. .. , n (3.1) where Y i denotes the i-th observation on the dependent variable Y which could be consumption, investment or output, and X i denotes the i-th observation on the independent variable X which could be disposable income, the interest rate or an input. These observations could be collected on firms or households at a given point in time, in which case we call the data a cross-section. Alternatively, these observations may be collected over time for a specific industry or country in which case we call the data a time-series. n is the number of observations, which could be the number of firms or households in a cross-section, or the number of years if the observations are collected annually. α and β are the intercept and slope of this simple linear relationship between Y and X. They are assumed to be unknown parameters to be estimated from the data. A plot of the data, i.e., Y versus X would be very illustrative showing what type of relationship exists empirically between these two variables. For example, if Y is consumption and X is disposable income then we would expect a positive relationship between these variables and the data may look like Figure 3.1 when plotted for a random sample of households. If α and β were known, one could draw the straight line (α + βX) as shown in Figure 3.1. It is clear that not all the observations (X i , Y i) lie on the straight line (α + βX). In fact, equation (3.1) states that the difference between each Y i and the corresponding (α + βX i) is due to a random error u i. This error may be due to (i) the omission of relevant factors that could influence consumption, other than disposable income, like real wealth or varying tastes, or unforseen events that induce households to consume more or less, (ii) measurement error, which could be the result of households not reporting their consumption or income accurately, or (iii) wrong choice of a linear relationship between consumption and income, when the true relationship may be nonlinear. These different causes of the error term will have different effects on the distribution of this error. In what follows, we consider only disturbances that satisfy some restrictive assumptions. In later chapters we relax these assumptions to account for more general kinds of error terms. In real life, α and β are not known, and have to be estimated from the observed data {(X i , Y i) for i = 1, 2,. .. , n}. This also means that the true line (α + βX) as well as the true disturbances (the u i 's) are unobservable. In this case, α and β could be estimated by the best fitting line through the data. Different researchers may draw different lines through the same data. What makes one line better than another? One measure of misfit is the amount of error from the observed Y i to the guessed line, let us call the latter Y i = α + βX i , where the hat (ˆ) denotes a guess on the appropriate parameter or variable. Each observation (X i , Y i) will have a corresponding observable error attached to it, which we will call e i = Y i − Y i , see Figure 3.2. In other words, we obtain the guessed Y i , (Y i) corresponding to each X i from the guessed line,

Cite

CITATION STYLE

APA

Baltagi, B. H. (2011). Simple Linear Regression. In Econometrics (pp. 49–72). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-20059-5_3

Simple Linear Regression

Abstract

Cite

Register to see more suggestions