Microsoft Word - quad.doc

1 of 24

September 2008 Dr. Robert Gebotys

Example: Quadratic Linear Model

Gebotys and Roberts (1989) were interested in examining

the effects of one variable (i.e. “age’) on the “seriousness

rating of the crime” (y); however, they wanted to fit a

quadratic curve to the data. The variables to be examined

within this example are “age” and “age squared” (i.e.,

‘agesq’). The model is: E(y|x)= B

+ B

age + B

age

Complete the following steps in the Linear regression

utilizing SPSS in order to follow the output explanation

starting on page thirteen.

1. Pull up “Crime” data set. (or enter data below)

2 of 24

September 2008 Dr. Robert Gebotys

Before proceeding to any analysis, the new variable

‘agesq’ needs to be created. To create ‘agesq’ do the

following:

1. Click Transform on the main menu bar.

2. Click Compute on the Transform menu. You

should now see the following box:

3 of 24

September 2008 Dr. Robert Gebotys

3. Type ‘agesq’ in the Target Variable text box,

located at the top left corner of the Compute

Variable dialogue box. This is to specify the new

variable that you are creating, ‘agesq’.

4. To specify the numeric expression of ‘agesq’ (that

is, the squaring of ‘age’), click “age” in the variable

source list and click the arrow button to the right

of the variable source list box. Then click the “**”

button on the lower right-hand side of the on-

screen key pad followed by clicking the ‘2’ button

on the keypad. As a result of the above clicks, the

expression “age**2”will be entered into the

Numeric Expression box.

5. Click the ‘OK’ command pushbutton of the

dialogue box. You will immediately see that a new

variable, ‘agesq’, has been entered into the Data

Editor. (Note

that the variable ‘agesq’ is set to two

decimal points. If you would like to re-set this

variable to “zero” decimal points to be congruent

4 of 24

September 2008 Dr. Robert Gebotys

with other variables do the following: (i) Double

click on ‘agesq’ variable label to “Define label

box”; (ii) See section called ‘Change settings’ and

click Type button; (iii) Change decimal places from

“2” to “0”; (iv) Click button Continue; and finally

(v) Click the ‘OK’ button. Your ‘agesq’ variable

should now contain no decimal places).

5 of 24

September 2008 Dr. Robert Gebotys

6. At this stage, you may save the data matrix on a

diskette, for example, under the file name

“crimeage2.sav”.

Fitting a Quadratic Regression Curve on the Scatterplot

1. Start by creating a scatterplot for the set of data on

age and crime seriousness.

2. To fit a quadratic regression curve, start by

clicking on the scatterplot as it appears in the

Results pane of the SPSS viewer window so that a

thin black line surrounds the scatterplot. Next,

click on Edit on the menu bar and select SPSS

Chart Object from the E

dit menu, followed by

clicking O

pen on the submenu. This series of clicks

will open a SPSS Chart Editor window. The

scatterplot you have produced will appear in the

Chart Editor window.

6 of 24

September 2008 Dr. Robert Gebotys

3. Click Chart on the Chart Editor window menu

bar, followed by clicking ‘O

ptions…’ in the Chart

menu. When the ‘Scatterplot Options’ dialogue

box is activated, click the check box to the left of

the Total in the ‘Fit Line’ box. Then click the ‘Fit

Options…’ button in the ‘Fit Line’ box. This will

open a ‘Scatterplot Options: Fit Line’ dialogue box

as shown below.

7 of 24

September 2008 Dr. Robert Gebotys

4. Click the ‘Quadratic regression’ box. Click the

‘Continue’ pushbutton. When the ‘Scatterplot

Options’ dialogue box reappears, click the ‘OK’

command pushbutton. After a few seconds, you

should see that a curve has already been fit to the

scatterplot. A copy of the scatterplot is reproduced

below. At this stage you can edit, save, and print

the scatterplot.

Scatterplot of "Serious" vs "Age"

Quadratic Line Fit

AGE

908070605040302010

SERIOUS

100

8 of 24

September 2008 Dr. Robert Gebotys

Specifying the Regression Procedure for a Polynomial

of Degree 2

Follow the Regression Procedure in “Example: The

Linear Model with Normal Error”. The one exception

is that both ‘age’ and ‘agesq’ should be defined as the

independent variables in this model. This is done by

clicking both ‘age’ and ‘agesq’ in the variable source

list and then clicking the arrow button to the left of the

‘I

ndependent[s:’ text box. The dialogue box should now

resemble the one below.

9 of 24

September 2008 Dr. Robert Gebotys

After completing the procedures as specified in the

“Example: Linear Model with Normal Error” (i.e.

designate ‘Statistics’, ‘Plots’ and ‘Save’ selections for

this analysis), then click the ‘OK’ command pushbutton

in the Linear Regression dialogue box. This will

instruct SPSS to produce a set of output similar to that

to be discussed in the next section.

Alternate Method to Perform Linear Regression

This is performed by utilizing the Run command in a SPSS

Syntax window. That is, the selections you have made in the

Linear Regression dialogue boxes are actually commands

for the regression procedure. You can click the Paste

command pushbutton in the Linear Regression dialogue

box to paste this underlying command syntax into a Syntax

window (i.e., a syntax window will be opened when the

Paste command is clicked and the selections that you have

made are reproduced in this window in the format of a

command syntax or SPSS programming language).

10 of 24

September 2008 Dr. Robert Gebotys

There is another way of specifying and running the

regression procedure for fitting the quadratic model. If

you have already saved the syntax commands in a file

(e.g., crime.sps), as outlined in “Example: Linear Model

with Normal Error”, you can specify and run the

regression procedure by following the steps described

below:

1. Insert the diskette containing the relevant file (e.g.,

crime.sps) into drive A.

2. Click F

ile on the main menu bar.

3. Click Open in the File menu to activate and open

File dialogue box.

4. Check if the current position of the drive (i.e. the

“Look in” text box) is drive A. If not, change it to

drive A by clicking on the arrow to the right of the

text box and selecting “3.5 Floppy [A:]”.

11 of 24

September 2008 Dr. Robert Gebotys

5. Check if the File type text box contains “SPSS

syntax files (*.sps)”. If not, change it by clicking on

the arrow to the right of the text box and selecting

that option using a single click.

6. The relevant file (e.g., crime.sps) should now be

listed in the large text box in the centre of the Open

File dialogue box. Select the file with a single click

and then proceed to that file by clicking the Open

command pushbutton. This will open a window

titled “crime – SPSS Syntax Editor”.

Sometimes the only ‘problem’

is the solution to the

‘problem’…….That’s it!

Focus on entering the syntax

correctly……..then there is

no ‘problem’, only ‘solutions’

or in SPSS lingo, ‘output’.

12 of 24

September 2008 Dr. Robert Gebotys

7. Go to the end of the line “/METHOD=ENTER

age” and click once to move your cursor/pointer to

that location. Add a space after ‘age’ and then

type ‘agesq’. Your command syntax should now

look like that listed below:

If you want to save or print this syntax window,

you may do so now.

8. Click ‘R

un’ on the Syntax Editor menu bar, and

then click A

ll on the Run menu. This will instruct

the computer to perform the required regression

procedure.

13 of 24

September 2008 Dr. Robert Gebotys

Quadratic Linear Model

SPSS Output Explanation

The REGRESSION command fits linear models by

least squares. The METHOD subcommand tells SPSS

what variables are in the model. The DEPENDENT

subcommand tells SPSS which is the y variable. The

ENTER command defines the x variables. In this case

we have asked SPSS to fit the model.

E(y|x)=B

+ B

x + B

The output, using SPSS windows or syntax, is

interpreted as follows:

Variables Entered/Removed

AGESQ,

AGE

. Enter

Model

Variables

Entered

Variables

Removed

Method

All requested variables entered.

Dependent Variable: SERIOUS

14 of 24

September 2008 Dr. Robert Gebotys

ANOVA

3896.297 2 1948.149 139.434 .000

97.803 7 13.972

3994.100 9

Regression

Residual

Total

Model

Sum of

Squares

Mean

Square

F Sig.

Predictors: (Constant), AGESQ, AGE

Dependent Variable: SERIOUS

Model Summary

.988

.976 .969 3.74 2.081

Model

R R Square

Adjusted R

Square

Std. Error

of the

Estimate

Durbin-Watson

Predictors: (Constant), AGESQ, AGE

Dependent Variable: SERIOUS

Coefficients

20.683 8.823 2.344 .052 -.180 41.545

-8.49E-02 .410 -.069 -.207 .842 -1.055 .885

1.262E-02 .004 1.055 3.174 .016 .003 .022

(Constant)

AGE

AGESQ

Model

B Std. Error

Unstandardized

Coefficients

Beta

Standardi

zed

Coefficien

t Sig.

Lower

Bound

Upper

Bound

95% Confidence Interval

for B

Dependent Variable: SERIOUS

15 of 24

September 2008 Dr. Robert Gebotys

In order to determine if the model is adequate we

examine the ANOVA table. Note the degrees of

freedom and F-statistic values.

F = 139.434

which has an F distribution with 2

(number of

parameters [3] – intercept B

[1] = 3 – 1 = 2) and 7

(number of observations – number of parameters

= 10 – 3 = 7) degrees of freedom. We reject

: B

= B

= 0

: B

≠

with p-value less than .0001, the SIGNIF value on the

output. The REGRESSION row refers to the model

and the RESIDUAL row refers to the error component.

The mean square of the residual is equal to s

our

estimate of sigma

, note s is also printed in the STD

ERROR OF THE ESTIMATE column.

= MSE= 13.972

s= 3.74

16 of 24

September 2008 Dr. Robert Gebotys

In the same area we also have R

, R SQUARE printed

where:

SSM/SST

= 3896/3994

= .97551

In other words 97.551% of the variance in seriousness is

accounted for by the model (‘age’, ‘agesq’).

In the Coefficients section the column model

variable lists the variable ‘age’, ‘agesq’ and ‘constant’

these refer to the variables associated with the

parameters B

, B

, and B

in the model. The column

labeled B given the least squares (b

= 20.683,

= -.085, b

= .0126) estimator for B

, B

, and B

The equation is therefore

E(y|x) = 20.683 – 085x + .012x

17 of 24

September 2008 Dr. Robert Gebotys

The STD ERROR column is the standard error column for

each of the parameters for example

s(b

) = 8.823

s(b

) = .410

s(b

) = .004

the T column gives the corresponding t statistic for testing

the hypothesis

: B

= 0

: B

≠

T = b

/s(b

) = -.207

: B

= 0

: B

≠

T = 3.174

For B

the t statistic has the value -.207, and for B

the t statistic has the value 3.174. The column SIG gives

the OLS or p-value for the test above. In this case we have

p=.842 for B

(not significant, therefore we cannot reject

) and p=.016 for B

(significant, therefore we can reject

); however, p=.016 for B

, therefore there is strong

evidence against H

. Both are with 7 degrees of freedom.

18 of 24

September 2008 Dr. Robert Gebotys

Casewise diagnostics gives a STD RESIDUAL COL with

STANDARDIZED VALUES between

≠

3. standard

deviations which is reasonable. The Du

rbin-Watson

Statistic is about 2 which indicates zero correlation. The

leverage (LEVER) and Cook’s distance (COOK D) values

for the 10

observation are relatively large

= .8977, D = 405.069) indicating this is an influential

observation. If we compare h

to 2meanh where

meanh = .2 (found in the summary statistics section) our

suspicion that the 10

observation (a person 80 year old

with a high seriousness rating) is influential is confirmed.

Notice that the residual for this observation is small and

therefore not an outlier.

Casewise Diagnostics

-.812 21 24.04 -3.04

.414 28 26.45 1.55

-.003 27 27.01 -1.08E-02

-.121 26 26.45 -.45

.937 33 29.50 3.50

.965 36 32.39 3.61

-1.736 31 37.49 -6.49

-.666 35 37.49 -2.49

.939 41 37.49 3.51

.082 95 94.69 .31

Case Number

Std.

Residual

SERIOUS

Predicted

Value

Residual

Dependent Variable: SERIOUS

19 of 24

September 2008 Dr. Robert Gebotys

Residuals Statistics

24.04 94.69 37.30 20.81 10

-.638 2.758 .000 1.000 10

1.28 3.73 1.93 .72 10

-35.46 39.73 24.58 21.72 10

-6.49 3.61 -7.11E-16 3.30 10

-1.736 .965 .000 .882 10

-2.013 1.690 .127 1.156 10

-8.73 130.46 12.72 41.60 10

-2.872 2.034 .077 1.400 10

.148 8.079 1.800 2.357 10

.000 405.070 40.618 128.055 10

.016 .898 .200 .262 10

Predicted Value

Std. Predicted Value

Standard Error of

Predicted Value

Adjusted Predicted Value

Residual

Std. Residual

Stud. Residual

Deleted Residual

Stud. Deleted Residual

Mahal. Distance

Cook's Distance

Centered Leverage Value

Minimum Maximum Mean

Std.

Deviation

Dependent Variable: SERIOUS

Look we got

studentized and

other residual

statistics without

asking for them.

Bonus!

20 of 24

September 2008 Dr. Robert Gebotys

The histogram of residuals looks reasonable, although

with 10 observations, this is difficult to judge.

The probability plot has improved, from the previous

“Example: Linear Model with Normal Error”, in the

sense that the residuals more closely approximate a

normal distribution. The large bulge present in the

Regression Standardized Residual

1.00.500.00-.50-1.00-1.50

Histogram

Dependent Variable: SERIOUS

Frequency

3.5

3.0

2.5

2.0

1.5

1.0

0.0

Std. Dev = .88

Mean = 0.00

N = 10.00

21 of 24

September 2008 Dr. Robert Gebotys

normal probability plot of residuals in the “Example:

Linear Model with Normal Error” is no longer present

in the polynomial of degree 2 model.

Normal P-P Plot of Regression

Standardized Residual

Dependent Variable: SERIOUS

Observed Cum Prob

1.00.75.50.250.00

cte

1.00

.75

.50

.25

0.00

22 of 24

September 2008 Dr. Robert Gebotys

The plot of y predicted vs e, displays a reasonable band

shape as

well.

Scatterplot

Dependent Variable: SERIOUS

Regression Standardized Predicted Value

3.02.52.01.51.0.50.0-.5-1.0

Regression Standardized Residual

1.0

0.0

-.5

-1.0

-1.5

-2.0

Yes! A reasonable

band shape for the

residuals and

predicted values.

23 of 24

September 2008 Dr. Robert Gebotys

In a future example, we will learn how to compare

the polynomial model discussed in this example, and the

linear model (previous example), utilizing the ANOVA

technique. Although the polynomial model has a higher

than the line we do not know whether this

improvement is statistically significant. In a future

example we will learn how to compare these types of

nested models.

See next page

for data table following analysis.

Now let’s see how the

additional variables in

the data matrix compare

to the syntax command

for this analysis?

24 of 24

January 2000 Dr. Robert Gebotys

age serious agesq pre_1 res_1 zpr_1 zre_1 coo_1 lev_1 lmci_1 umci_1 lici_1 uici_1

20 21 400 24.035 -3.035 -0.638 -0.812 0.315 0.344 18.148 29.923 13.416 34.655

25 28 625 26.452 1.548 -0.521 0.414 0.016 0.084 22.658 30.246 16.833 36.070

26 27 676 27.011 -0.011 -0.495 -0.003 0.000 0.058 23.496 30.526 17.499 36.523

25 26 625 26.452 -0.452 -0.521 -0.121 0.001 0.084 22.658 30.246 16.833 36.070

30 33 900 29.499 3.501 -0.375 0.937 0.044 0.016 26.483 32.516 20.160 38.839

34 36 1156 32.392 3.608 -0.236 0.965 0.062 0.046 29.010 35.774 22.928 41.856

40 31 1600 37.488 -6.488 0.009 -1.736 0.466 0.156 33.013 41.964 27.581 47.395

40 35 1600 37.488 -2.488 0.009 -0.666 0.068 0.156 33.013 41.964 27.581 47.395

40 41 1600 37.488 3.512 0.009 0.939 0.136 0.156 33.013 41.964 27.581 47.395

80 95 6400 94.694 0.306 2.758 0.082 405.070 0.898 85.866 103.522 82.201 107.186