Comparing Different Formulas for Creating Summated Scales in SPSS

Example: We have 3 variables, ‘I like to hang around with my friends during my spare time’ (friends), ‘I like to drink alcohol during my spare time’ (alcohol) and ‘I like to smoke cigarettes during my spare time’ (smoke).

Let’s assume, that the variable ‘friends’ has a missing value, the variable ‘alcohol’ has a value of 1 and ‘smoke’ has a value of 2.

COMPUTE rockrollplus = (friends + alcohol + smoke)/3.

Only counts the sum if the respondent has answered every question. In this case the value will be 0, because the value for ‘friends’ is missing.

COMPUTE rockrollsum = sum(friends,alcohol,smoke)/3.

Counts the sum even with missing values. In this case the mean will be 1, because 1+2 / 3 = 1.

COMPUTE rockrollmean = mean(friends,alcohol,smoke).

Counts the mean even with missing values, but divides the sum with the number of variables with non-missing values. In this case the mean would be 1,5, because 1+2 / 2 = 1,5.

The formula using the mean function is usually the best, except with a lot of missing values. For example, if we only had one non-missing value such as ‘smoke = 2’, the mean would be 2, even when two variables have missing values. A solution would be a formula like this:

COMPUTE rockrollmean2 = mean.2(friends,alcohol,smoke).

This formula counts the mean only when at least two of the variables have non-missing values.

Choosing the Correct Statistical Algorithm

This course only covers a very limited amount of statistical tests and methods. Still, it is important to realize that the type of statistics you can use depends on the type (or scale, if you will) of the dependent variable. Recall the distinction between categorical and continuous variables which was in the reading material for day 1.

To simplify that discussion for the methods learnt during this course, if you have

  • one variable, which is continuous: choose Analyze/Descriptive Statistics/Descriptives
  • one variable, which is categorical: choose Analyze/Descriptive Statistics/Frequencies
  • two variables, which are categorical: choose crosstabs
  • two variables, which are continuous: choose correlation & scatterplots

Session 4: Scatterplots and Correlations

Making a Scatterplot

A scatterplot gives a visual picture of the relationship of (usually)
two continuous variables. One can also use Likert-scaled variables in
a scatterplot. Pick two variables that can be seen as an explanatory
(independent) variable and a response (dependent) variable. Draw a
scatterplot (
Graphs/Legacy Dialogs/Scatter/Simple Scatter

) so that the
dependent variable goes to the Y axis and the independent variable
goes to the X axis. Interpret the resulting scatterplot.

Add a regression line to the finished figure by first entering the chart editor by double clicking the chart, and then selecting the
plots in the figure by clicking them and selecting
Chart/Add Chart Element/Fit Line at Total/Fit Method: Linear

. What conclusions you can draw from the scatterplot?

Computing Correlations

If it appears that there is a linear dependency between your chosen
variables, compute the correlation between the variables
(
Analyze/Correlations/Bivariate

). What does the correlation
coefficient tell you? Is the correlation between the variables
statistically significant?

Correlation and Statistical Significance

Go back to (or redo) the correlation matrix made in day 3, and write a short
analysis on its correlation coefficients and their statistical significance.

Interpreting Correlation

Draw a scatterplot with a regression line and compute the correlation
coeffiecient for the
variables “Country’s cultural life undermined or enriched by immigrants”
(imueclt) and “Year of birth” (yrbrn). How do you interpret the correlation
coefficient?

Preparing Graphs and Tables for Reports

Pick a graph or a table you made before, and make it fit for
publishing. Use the following guidelines (adapted from Maarit
Valtari’s
SPSS-Guide

, especially for crosstabulations):

  • Only display row, column OR cell percentages in your tables, but
    don’t use the %-character. Usually there’s no need to display cell
    frequencies.
  • Display the figures, where the percentages are counted from,
    i.e. row and column sums and their total sum. If the reader is
    interested in cell frequencies, he/she can count them utilizing the
    displayed percentages.
  • A good way to make sure that the percentages are displayed
    correctly is to attempt to explain the central conclusion of the
    table. An example from a study regarding reading as a hobby: “88% of
    adults in Narujärvi, and 32% of adults in Uppojoki read at least 3
    novels each year.” (It would be considerably more awkward to state the
    percentage of people reading novels coming from Uppojoki.)
  • Avoid displaying too much decimal numbers, they make your graphs
    less readable.
  • Display the sum of percentages, that is a row or column of 100s,
    either in the right or bottom marginal depending on the direction of
    the counted percentages.
  • The meaning of a variable’s categories must be explicated
    clearly. Usually the variable value (i.e. the category “number”) is
    not enough, but each category should have a description in the
    table. This is easily done by giving the variable value labels in the
    Variable View. If the category descriptions don’t fit in the table,
    further explications can be situated in the footnotes.
  • The table title should be informational, and additional
    descriptions can be written in the footnotes if needed.
    The reader
    should be able to understand the table’s contents without looking for
    explanations from the report’s text proper.

    The title does not
    need to be smart nor snappy, this being a scientific report. An
    example of a title: The distribution of Finnish women aged 18 to 35
    years in income categories in the year 1997.
  • The table source is written in the lower margin of the table.
  • Repeating the figures and details from the table excessively in
    the report’s text is a sure way of making the reader lose
    interest.
  • A table needs to have a number in addition to a title.
  • When dealing with a table of percentages, the number of
    observations should be clearly displayed.
  • The best tip: Show your tables and graphs to a friend who’s not
    afraid to speak out, and ask her to tell you what the tables are
    about, and what conclusions one can make from the figures.

For information on how to tidy up the charts and graphs in your final report, consult these examples.

Session 3: Variable Transformations and Summated Scales

Variable Transformations

Examine the basic statistics and distribution of the variable Year of birth
(yrbrn). Does the distribution make sense? In case you notice
any clear mistakes or irregularities, code them into missing values
(Transform/Recode/Into Different Variables). Save the command
into the Syntax Window using the “Paste” button. Inspect the basic
statistics of the new variable and check that the distribution looks
ok. Write a short interpretation on the basic statistics.

It’s useful to transform the variable “Year of Birth” into an
age variable. This is done by substracting the variable in question
from the year when the material was collected (in this case, 2008)
(Transform/Compute).

Categorizing Continuous Variables

Examine the frequency distribution of the new age variable. To what
kind of age categories should the variable be divided? Make a
categorized age variable (Transform/Recode/Into Different
Variables
) and paste the command into the Syntax Window. Remember
to enter value labels for the new variable.

Reverse Coding a Likert Scale Variable

A common reason for recoding a variable is to reverse code a 5-point
Likert scale. Often some questions in a survey are reverse worded, so
the scale needs to be reversed before the point values can be compared
or used in a “summated” scale.

Pick one variable, for example “Nowadays customers and consumers are in a
better position to protect their interests” (question E22, variable name
‘cmprcti’), and reverse code it so that 5 becomes “agree strongly” and 1
becomes “disagrees strongly”. After reverse coding it, you might compare it
to questions E20 and E21.

Remember to paste the command into the Syntax Window. By modifying the
syntax, you can easily reverse code multiple variables. Remember to
rename the value lables of the resulting variable.

Alternatively, you can paste this into your Syntax Window:


RECODE
  oldvar
  (1=5)  (2=4)  (3=3)  (4=2)  (5=1)  (MISSING=SYSMIS) INTO  reversed_var .
EXECUTE .

  • Recoding Variables in SPSS
  • Entering value labels for a variable: Help/Tutorial/Using the Data
    Editor/Defining Data/Adding Value Labels for Numeric Variables

Creating a Summated Scale

A summated scale is built from individual items that are supposed
to describe the same phenomenon. Check if any of the variables chosen
for the summated scale need to be reversed (see previous exercise).

Before building the summated scale, one should check that the items
chosen correlate positively
(Analyze/Correlate/Bivariate). What do the correlations look
like? (We return to the question of interpreting correlations on Day 4.)
What is the Cronbach’s Alpha for the planned summated
scale (Analyze/Scale/Reliability analysis)? Should one of the
items be removed (Alpha if item deleted) so that the alpha
would be higher? Would the removal make sense substance-wise, or
weaken the validity of the scale? Write down your interpretations.

Compute the summated scale (Transform/Compute) with the
following formula: (x1+x2+x3)/3, where x1, x2 and x3 stand for
variable names, and the sum is divided by the number of variables. You
may also enter the equation directly into the Syntax Window if you
wish:

COMPUTE newvar = ( var1 + var2 + var3 ) / 3 .
EXECUTE .

Variable Transformation II

Conduct a variable transformation, where you’ll compute the amount of time
people spend reading newspapers on issues other than politics and current
affairs. Use variables “Newspaper reading, total time on average weekday”
(nwsptot) and “Newspaper reading, politics/current affairs on average weekday”
(nwsppol). Remember to enter value labels for the new variable.

Comparing Different Formulas for Building a Summated Scale

We can apply different ways to build a summated scale depending on
the situation and the amount of missing values. Compare scales built
using the following formulas, when some of the variables contain
missing values: (x1+x2+x3)/3, sum(x1,x2,x3)/3, mean(x1,x2,x3). Write
down your interpretations.