Session 1: Getting familiar with SPSS

User Interface

Create a new folder in your network drive Z. You can use it during
this course to save your SPSS-related work such as graphs, tables and
your interpretations. Open the executable SPSS version PASW 18. Explore
and familiarize yourself with the views and menus of SPSS. The Data
Editor window has two views, between which you can select by clicking
on the Data View and Variable View tabs in the bottom left corner of
the window. What’s the difference between these two views? Find out
also, what are the Syntax and Output windows.

  • SPSS Windows, Menus and Toolbars.
  • On SPSS Data vs. Variable view, see SPSS Tutorial Help/Tutorial/Using the Data Editor/Using the Data Editor

Entering Data

Do you remember the difference between continuous and categorical
variables? Create variables in SPSS’s Variable
View based on the background questions in the sample questionnaire.

Give the variables both a name and a label. The name can’t be longer
than 8 characters and the maximum length of a label is 255 characters. (Actually they can be longer, but if you want to keep your data compatible with older versions of SPSS, it might be a good idea to keep to these limits.) Descriptive and thought-out variable names and labels make the data
much more accessible to others, and even to yourself, if you find
yourself coming back to a data you haven’t looked at for a while.

If the variable doesn’t need decimal notation, change the amount of
decimals to zero from the default setting of two digits. If the
variable is categorized, enter the category labels at the column
marked “Values” by clicking the cell first, and then clicking the grey
dotted box. The variable type is mostly self-explanatory: the type is
set to “string”, when the variable contents are non-numeric, such as
answers to open-ended questions.

Enter the questionnaire data from all of the six respondents to SPSS. You need
to use the Data View for this. Enter the values one observational unit
(i.e., one row in SPSS Data View) at a time. Save the data in your course folder
(File/Save As).

  • On entering data, see SPSS tutorial Help/Tutorial/Using the Data Editor/Entering Numeric Data and Help/Tutorial/Using the Data Editor/Defining Data
  • An explanation on the differences of discrete and continuous variables, levels of measurement, and continuous and categorical variables

Examining Basic Statistics

Recall the possible levels of measurement for variables, and what kind
of statistics can be calculated from variables of different levels of
measurement.

Open the course data, and save it in your course
folder. From now on, we will use this data for all
exercises. Familiarize yourself with the variables by examining the
variables and questions. While looking through
the questions, think about the variable classes. Which of the variables are
categorical, and which are continuous?

Pick a couple of variables from the data, for example Gender (gndr), Father’s highest level of education (edulvlf), Hours you spend studying, how many an average term-time week (stdhrsw), and examine some of
their descriptive statistics such as mean, standard deviation, minimum and
maximum (Analyze/Descriptive Statistics/Frequencies or Analyze/Descriptive
Statistics/Descriptives
).

Contemplate on the various interpretations the different statistics can
have, and write a short summary on them.

Hint: When using the dialog window instead of the Syntax window to choose variables in SPSS, sometimes it’s rather frustrating trying to find the variable you’re looking for. If you know the letter the variable description begins with, you can press that letter on your keyboard to cycle through the variables. To see the full variable description, hover the mouse pointer over the variable in question. E.g. when trying to find the variable ‘stdhrsw’ you would keep pressing ‘h’ until you see a description that begins like ‘Hours you spend studying…’.

  • <a href="http://www.socialresearchmethods.net/kb/statdesc.htm"Descriptive statistics
  • Levels of measurement

Graphs and Histograms

Bar graphs can be used to examine the distributions of discrete
variables. Pick one discrete variable and make a bar graph
(Graphs/Legacy Dialogs/Bar/Simple). Move your variable to the slot labeled
Category Axis“. Examine the bar frequencies and write a short
analysis based on your findings.

A histogram can be used to display distributions of continuous
variables. Pick one variable, and draw a histogram
(Graphs/Legacy Dialogs/Histogram). Move your variable to the slot labeled
Variable“. Examine the resulting histogram. Does the
variable appear to be normally distributed?

Course schedule

Group 1

Session 1 Monday October 31 10 am Aleksandria K 130
Session 2 Friday Nov 4 10 am Aleks K 130
Session 3 Monday Nov 7 10 am Aleks K 130
Session 4 Friday Nov 11 10 am Aleks K 130

Support session Friday Nov 18 10 am Aleks K 130
Report deadline Tuesday Nov 22 4 pm email
Report presentation Friday Nov 25 10 am Aleks K 130
Deadline for rewrites Friday Dec 2 4 pm email

Group 2

Session 1 Monday October 31 4 pm Aleksandria K 131
Session 2 Wednesday Nov 2 4 pm Aleks K 131
Session 3 Monday Nov 7 4 pm Aleks K 131
Session 4 Wednesday Nov 9 4 pm Aleks K 131

Support session Wednesday Nov 16 4 pm Aleks K 133
Report deadline Saturday Nov 19 4 pm email
Report presentation Wednesday Nov 23 4 pm Aleks K 131
Deadline for rewrites Friday Dec 2 4 pm email

Session 2: Recoding and Crosstabulating

Recoding variables

Examine the frequency distributions of different categorized variables, and
pick a variable to recategorize
(Transform/Recode/Into Different Variables). Avoid raising
the level of abstraction too high (ie. too few categories) so that you
don’t lose too much information. Save the command into the Syntax
window by clicking the “Paste” button, and make notes to yourself of
what kind of conversion you did. Remember to rename the value labels
of the variable.

Cross-tabulation

Crosstabs (aka contingency tables) are used to examine two categorized
variables. You can also
use a continuous variable, but it needs to be categorized
first. (Categorizing a continuous variable will be covered on Day Three.)

Pick two variables that might be dependent on each other in a way that
one could be used to explain the other.
Create a crosstabulation (Analyze/Descriptive
Statistics/Crosstabs
) where the independent variable is the
row variable, and the dependent variable is the column variable. Select
the percentages along the direction of the independent variable.

How to interpret the resulting table? If there are a lot of cells with
zero count, reconsider the way the variable was recategorized. Think
of alternative ways to recategorize the variable without losing too much
information. Write down your interpretation from the final version of
the table.

Clustered Bar Graphs

Draw a bar graph based on the crosstabulation you did in the previous
exercise (Graphs/Legacy Dialogs/Bar/Clustered). The independent variable
goes into the slot labelled Category Axis and the dependent
variable needs to be inserted in the field marked Define Clusters
by
. Examine the frequencies and write down your interpretation of
the resulting graph.

The Chi-square Test

Conduct the Chi-square test on the crosstabulation from today’s second exercise,
and think about its interpretation. You may also use some alternate
variables. Write a short analysis based on your interpretations.

Layered Crosstabs

Make a layered crosstab based on the table you made in exercise 2. You
can use a background variable such as gender(gndr), for example. Write a
short summary about your findings. Does the third variable reveal something
about the relationship of the row and column variables?

  • On making and interpreting layered crosstabs, see Help/Tutorial/Crosstabulation Tables/Adding a Layer Variable

Data

The suggested data to use with the exercises is the European Social Survey Round 4. Since the
data file is quite large, it is suggested that one uses only a subset of the
data while doing the exercises, for example, only Finnish data. The data files
are available as free downloads from the ESS
website
after a quick registration.

How to Obtain a Copy of the Course Data

  1. Download the Finnish data
  2. see some documentation
  3. Go to the European Social Survey homepage.
  4. Click “New User” on the left-hand navigation bar. Note that the registration form uses faulty code, and the form labels are visible only with Internet Explorer 6.0.
  5. Fill in the form, and click “Register”.
  6. Go to ESS round 2 data download page, and select the country data you wish to use on this course from the drop-down list. The default subset is for Finland.
  7. Download the SPSS file by clicking on the SPSS icon on the country file row.
  8. At this point, enter the email address you just registered with as your username.
  9. Save the data to your course folder.
  10. Next, download the Variables and questions list and the Variable list which you’ll need to familiarize yourself with the survey data.

Final Report Guidelines

Schedule

The final report will be written in groups of two after the first four group meetings. On the 5th group meeting (support group) the teacher will look through the preliminary reports and give feedback on them. The more work you have put to the report before this meeting, the easier it will be for you to finish the report on your own.

*The presentation version of the report has to be submitted to the instructor and the peer evaluators till the day indicated in the Course schedule. The final report groups will present their work and the peer evaluation groups their critique on the 6th group meeting.

Final Report Groups

The report groups will consist of two people from the same practice group. You are free to choose you pair, or if needed, the group instructor will divide you into pairs. The final report groups will be formed during the 4th group meeting. The group instructor will assign a peer evaluator group for each final report group so that every report group has to both write a final report and evaluate another report.

Instructions

The report group has to formulate a research question from the course data, i.e. one independent variable, one dependent variable, and possibly a third background variable. Using the methods learnt during the course, the group seeks to answer the research question in the final report. The research questions have to be approved by the group instructor. Email your research questions to the instructor by the time informed in the 4th session.

In order to get a passing grade, you have to do the following things in your final report:

  • Explain and justify your choice of variables
  • Present their basic statistics and distributions
  • Explain and justify the variable transformations you have done
  • Answer the research question with meaningful methods
  • Edit the charts and graphs so that they are fit for publication
  • Give your charts and graphs concise and informative headlines
  • Interpret your charts and graphs
  • Write a short summary on your main findings

See the sample charts for guidelines on how you should edit your charts and graphs.

Final Report Length

The ideal length of the final report is 5 pages including charts and graphs. The written part should be concise and to the point.

Peer Evaluation

The job of the peer evaluators is to preexamine the final report and present their critique in the final meeting. The peer reviewers will go through each of the eight requirements of the final report, and give a short statement on each:

  • Has the choice of variables been explained and justified?
  • Have the basic statistics and distributions been presented?
  • Have the variable transformations been explained and justified?
  • Has the research question been answered with meaningful methods?
  • Have the charts and graphs been edited according to the requirements?
  • Have the charts and graphs been given concise and informative headlines?
  • Have the charts and graphs been interpreted?
  • Has a short summary on the main findings been written?

In addition to a short presentation in the final meeting the evaluators print out and fill a peer evaluation form, which they will hand in to the instructor in the beginning of the last meeting. (So make a copy of the form if you use it yourself when making your presentation.)

Presentation

Every final report group will give a short oral presentation during the last group meeting. The presentation should cover the research question, a summary of the findings and a chart or graph which demonstrates the findings. Both of the students responsible for the report should participate equally in the presentation. The maximum length of the presentation is 10 minutes, after which the peer evaluators have 5 minutes to present their critique. The possible files needed in the presentation should be brought along on a USB memory stick or saved in a folder accessible with a web browser.

After the presentation and the peer evaluator’s report the group instructor will decide if the work needs further review before submitting it for grading. The written report can be revised before submitting the final version even if the group instructor gives you a permission to submit it. The revised versions of the final report have to be handed in by the time indicated in the Course schedule. The final versions will be emailed (allowed formats: .doc, .odt, or .pdf) to the following addresses: timo.harmo(at)helsinki.fi . and ari.erti(at)helsinki.fi.

Grading

The course will be graded passed – not passed. Half of the grade consists of the quality of the written report, 25% consists of the presentation of the report, and 25% of the grade consists of the quality of the peer evaluation.

General information

Course Description

The aim of the course is to learn the basics of the SPSS for Windows program. SPSS is used on quantitative methods courses, where you’ll need
the skills learnt on this course. The course is at its most useful when taken just before the quantitative methods course of your major subject. The course participants should be familiar with ICT basics and basic statistics. Only students of the Faculty of Social Science can participate in the course.

Program

The course consists of four two-hour group meetings, a written final report done in groups of two, a support session for the writing the report and the oral presentation of the report in the final meeting. Additionally, every report group (consisting of 2 students each) has to examine critically one final report, and present their criticism in the final meeting.

The course requires a lot of work relative to the number of credits if understanding statistical concepts is not one of your strengths. In addition to participating in the group meetings, you need to study the course materials autonomously.

  • Group meetings – The use of the SPSS program is practiced in the first four lab-session. Active participation in all of the meetings is obligatory to complete the course.
  • Final report – In addition to class attendance everyone is expected to write a final report, which is a small research project based on the course data. The methods learnt during the course will be applied in the final report. The report will be written in groups of two people after the fourth meeting of the course. There is a (obligatory) support session for report writing. The final reports will be turned in on the last meeting, when the groups will present their findings and criticisms.
  • Report deadline– The reports or their preliminary versions must be sent to the opponent-group and the instructor latest on the date indicated in the Course schedule.
  • Peer evaluation – In addition to presenting their own work, every report group serves as an opponent to one other final report in the final meeting. The quality of the peer evaluation counts as part of the course grade.

Completion and Grading

The completion of the course is worth 2 credit points. The course is graded on a scale of passed / failed. Passing grade requires a passing final report, acceptable oral report and peer evaluation, and presence at the meetings,

Exercises

The exercises are based on a course material in Finnish by Mia Teräsaho.

User account

Please make sure before the beginning of the course that you have a valid standard user account. If you can log in to the computers at the faculty or faculty library, everything should be in order. For more information on user accounts see the IT department’s web pages. If you have problems with your user account, contact the user account office.

Contact persons

The teacher of the course is Ari Erti. Email ari.erti (at) helsinki.fi
Timo Harmo should be contacted in general matters concerning ICT-studies. Email timo.harmo (at) helsinki.fi .

Comparing Different Formulas for Creating Summated Scales in SPSS

Example: We have 3 variables, ‘I like to hang around with my friends during my spare time’ (friends), ‘I like to drink alcohol during my spare time’ (alcohol) and ‘I like to smoke cigarettes during my spare time’ (smoke).

Let’s assume, that the variable ‘friends’ has a missing value, the variable ‘alcohol’ has a value of 1 and ‘smoke’ has a value of 2.

COMPUTE rockrollplus = (friends + alcohol + smoke)/3.

Only counts the sum if the respondent has answered every question. In this case the value will be 0, because the value for ‘friends’ is missing.

COMPUTE rockrollsum = sum(friends,alcohol,smoke)/3.

Counts the sum even with missing values. In this case the mean will be 1, because 1+2 / 3 = 1.

COMPUTE rockrollmean = mean(friends,alcohol,smoke).

Counts the mean even with missing values, but divides the sum with the number of variables with non-missing values. In this case the mean would be 1,5, because 1+2 / 2 = 1,5.

The formula using the mean function is usually the best, except with a lot of missing values. For example, if we only had one non-missing value such as ‘smoke = 2’, the mean would be 2, even when two variables have missing values. A solution would be a formula like this:

COMPUTE rockrollmean2 = mean.2(friends,alcohol,smoke).

This formula counts the mean only when at least two of the variables have non-missing values.

Choosing the Correct Statistical Algorithm

This course only covers a very limited amount of statistical tests and methods. Still, it is important to realize that the type of statistics you can use depends on the type (or scale, if you will) of the dependent variable. Recall the distinction between categorical and continuous variables which was in the reading material for day 1.

To simplify that discussion for the methods learnt during this course, if you have

  • one variable, which is continuous: choose Analyze/Descriptive Statistics/Descriptives
  • one variable, which is categorical: choose Analyze/Descriptive Statistics/Frequencies
  • two variables, which are categorical: choose crosstabs
  • two variables, which are continuous: choose correlation & scatterplots

Session 4: Scatterplots and Correlations

Making a Scatterplot

A scatterplot gives a visual picture of the relationship of (usually)
two continuous variables. One can also use Likert-scaled variables in
a scatterplot. Pick two variables that can be seen as an explanatory
(independent) variable and a response (dependent) variable. Draw a
scatterplot (
Graphs/Legacy Dialogs/Scatter/Simple Scatter

) so that the
dependent variable goes to the Y axis and the independent variable
goes to the X axis. Interpret the resulting scatterplot.

Add a regression line to the finished figure by first entering the chart editor by double clicking the chart, and then selecting the
plots in the figure by clicking them and selecting
Chart/Add Chart Element/Fit Line at Total/Fit Method: Linear

. What conclusions you can draw from the scatterplot?

Computing Correlations

If it appears that there is a linear dependency between your chosen
variables, compute the correlation between the variables
(
Analyze/Correlations/Bivariate

). What does the correlation
coefficient tell you? Is the correlation between the variables
statistically significant?

Correlation and Statistical Significance

Go back to (or redo) the correlation matrix made in day 3, and write a short
analysis on its correlation coefficients and their statistical significance.

Interpreting Correlation

Draw a scatterplot with a regression line and compute the correlation
coeffiecient for the
variables “Country’s cultural life undermined or enriched by immigrants”
(imueclt) and “Year of birth” (yrbrn). How do you interpret the correlation
coefficient?

Preparing Graphs and Tables for Reports

Pick a graph or a table you made before, and make it fit for
publishing. Use the following guidelines (adapted from Maarit
Valtari’s
SPSS-Guide

, especially for crosstabulations):

  • Only display row, column OR cell percentages in your tables, but
    don’t use the %-character. Usually there’s no need to display cell
    frequencies.
  • Display the figures, where the percentages are counted from,
    i.e. row and column sums and their total sum. If the reader is
    interested in cell frequencies, he/she can count them utilizing the
    displayed percentages.
  • A good way to make sure that the percentages are displayed
    correctly is to attempt to explain the central conclusion of the
    table. An example from a study regarding reading as a hobby: “88% of
    adults in Narujärvi, and 32% of adults in Uppojoki read at least 3
    novels each year.” (It would be considerably more awkward to state the
    percentage of people reading novels coming from Uppojoki.)
  • Avoid displaying too much decimal numbers, they make your graphs
    less readable.
  • Display the sum of percentages, that is a row or column of 100s,
    either in the right or bottom marginal depending on the direction of
    the counted percentages.
  • The meaning of a variable’s categories must be explicated
    clearly. Usually the variable value (i.e. the category “number”) is
    not enough, but each category should have a description in the
    table. This is easily done by giving the variable value labels in the
    Variable View. If the category descriptions don’t fit in the table,
    further explications can be situated in the footnotes.
  • The table title should be informational, and additional
    descriptions can be written in the footnotes if needed.
    The reader
    should be able to understand the table’s contents without looking for
    explanations from the report’s text proper.

    The title does not
    need to be smart nor snappy, this being a scientific report. An
    example of a title: The distribution of Finnish women aged 18 to 35
    years in income categories in the year 1997.
  • The table source is written in the lower margin of the table.
  • Repeating the figures and details from the table excessively in
    the report’s text is a sure way of making the reader lose
    interest.
  • A table needs to have a number in addition to a title.
  • When dealing with a table of percentages, the number of
    observations should be clearly displayed.
  • The best tip: Show your tables and graphs to a friend who’s not
    afraid to speak out, and ask her to tell you what the tables are
    about, and what conclusions one can make from the figures.

For information on how to tidy up the charts and graphs in your final report, consult these examples.

Session 3: Variable Transformations and Summated Scales

Variable Transformations

Examine the basic statistics and distribution of the variable Year of birth
(yrbrn). Does the distribution make sense? In case you notice
any clear mistakes or irregularities, code them into missing values
(Transform/Recode/Into Different Variables). Save the command
into the Syntax Window using the “Paste” button. Inspect the basic
statistics of the new variable and check that the distribution looks
ok. Write a short interpretation on the basic statistics.

It’s useful to transform the variable “Year of Birth” into an
age variable. This is done by substracting the variable in question
from the year when the material was collected (in this case, 2008)
(Transform/Compute).

Categorizing Continuous Variables

Examine the frequency distribution of the new age variable. To what
kind of age categories should the variable be divided? Make a
categorized age variable (Transform/Recode/Into Different
Variables
) and paste the command into the Syntax Window. Remember
to enter value labels for the new variable.

Reverse Coding a Likert Scale Variable

A common reason for recoding a variable is to reverse code a 5-point
Likert scale. Often some questions in a survey are reverse worded, so
the scale needs to be reversed before the point values can be compared
or used in a “summated” scale.

Pick one variable, for example “Nowadays customers and consumers are in a
better position to protect their interests” (question E22, variable name
‘cmprcti’), and reverse code it so that 5 becomes “agree strongly” and 1
becomes “disagrees strongly”. After reverse coding it, you might compare it
to questions E20 and E21.

Remember to paste the command into the Syntax Window. By modifying the
syntax, you can easily reverse code multiple variables. Remember to
rename the value lables of the resulting variable.

Alternatively, you can paste this into your Syntax Window:


RECODE
  oldvar
  (1=5)  (2=4)  (3=3)  (4=2)  (5=1)  (MISSING=SYSMIS) INTO  reversed_var .
EXECUTE .

  • Recoding Variables in SPSS
  • Entering value labels for a variable: Help/Tutorial/Using the Data
    Editor/Defining Data/Adding Value Labels for Numeric Variables

Creating a Summated Scale

A summated scale is built from individual items that are supposed
to describe the same phenomenon. Check if any of the variables chosen
for the summated scale need to be reversed (see previous exercise).

Before building the summated scale, one should check that the items
chosen correlate positively
(Analyze/Correlate/Bivariate). What do the correlations look
like? (We return to the question of interpreting correlations on Day 4.)
What is the Cronbach’s Alpha for the planned summated
scale (Analyze/Scale/Reliability analysis)? Should one of the
items be removed (Alpha if item deleted) so that the alpha
would be higher? Would the removal make sense substance-wise, or
weaken the validity of the scale? Write down your interpretations.

Compute the summated scale (Transform/Compute) with the
following formula: (x1+x2+x3)/3, where x1, x2 and x3 stand for
variable names, and the sum is divided by the number of variables. You
may also enter the equation directly into the Syntax Window if you
wish:

COMPUTE newvar = ( var1 + var2 + var3 ) / 3 .
EXECUTE .

Variable Transformation II

Conduct a variable transformation, where you’ll compute the amount of time
people spend reading newspapers on issues other than politics and current
affairs. Use variables “Newspaper reading, total time on average weekday”
(nwsptot) and “Newspaper reading, politics/current affairs on average weekday”
(nwsppol). Remember to enter value labels for the new variable.

Comparing Different Formulas for Building a Summated Scale

We can apply different ways to build a summated scale depending on
the situation and the amount of missing values. Compare scales built
using the following formulas, when some of the variables contain
missing values: (x1+x2+x3)/3, sum(x1,x2,x3)/3, mean(x1,x2,x3). Write
down your interpretations.