You are here

Identifying Patterns

16 February, 2016 - 12:58

Data analysis is about identifying, describing, and explaining patterns. Univariate analysis is the most basic form of analysis that quantitative researchers conduct. In this form, researchers describe patterns across just one variable. Univariate analysis includes frequency distributions and measures of central tendency. A frequency distribution is a way of summarizing the distribution of responses on a single survey question. Let’s look at the frequency distribution for just one variable from my older worker survey. We’ll analyze the item mentioned first in the codebook excerpt given earlier, on respondents’ self-reported financial security.

Table 8.3 Frequency Distribution of Older Workers’ Financial Security

In general, how financially secure would you say you are?

Value

Frequency

Percentage

Label

 

Not at all secure

1

46

25.6

Between not at all and moderately secure

2

43

23.9

Moderately secure

3

76

42.2

Between moderately and very secure

4

11

6.1

Very secure

5

4

2.2

Total valid cases = 180; no response = 3

 
 

As you can see in the frequency distribution on self-reported financial security, more respondents reported feeling “moderately secure” than any other response category. We also learn from this single frequency distribution that fewer than 10% of respondents reported being in one of the two most secure categories.

Another form of univariate analysis that survey researchers can conduct on single variables is measures of central tendency. Measures of central tendency tell us what the most common, or average, response is on a question. Measures of central tendency can be taken for any level variable of those we learned about in "Defining and Measuring Concepts", from nominal to ratio. There are three kinds of measures of central tendency: modes, medians, and means. Mode refers to the most common response given to a question. Modes are most appropriate for nominal-level variables. A median is the middle point in a distribution of responses. Median is the appropriate measure of central tendency for ordinal-level variables. Finally, the measure of central tendency used for interval-and ratio-level variables is the mean. To obtain a mean, one must add the value of all responses on a given variable and then divide that number by the total number of responses.

In the previous example of older workers’ self-reported levels of financial security, the appropriate measure of central tendency would be the median, as this is an ordinal-level variable. If we were to list all responses to the financial security question in order and then choose the middle point in that list, we’d have our median. In "Figure 8.5", the value of each response to the financial security question is noted, and the middle point within that range of responses is highlighted. To find the middle point, we simply divide the number of valid cases by two. The number of valid cases, 180, divided by 2 is 90, so we’re looking for the 90th value on our distribution to discover the median. As you’ll see in "Figure 8.5", that value is 3, thus the median on our financial security question is 3, or “moderately secure.”

media/image1.png
Figure 8.5 Distribution of Responses and Median Value on Workers’ Financial Security (Missing in original) 

As you can see, we can learn a lot about our respondents simply by conducting univariate analysis of measures on our survey. We can learn even more, of course, when we begin to examine relationships among variables. Either we can analyze the relationships between two variables, called bivariate analysis, or we can examine relationships among more than two variables. This latter type of analysis is known as multivariate analysis.

Bivariate analysis allows us to assess covariation among two variables. This means we can find out whether changes in one variable occur together with changes in another. If two variables do not covary, they are said to have independence. This means simply that there is no relationship between the two variables in question. To learn whether a relationship exists between two variables, a researcher may cross-tabulate the two variables and present their relationship in a contingency table. A contingency table shows how variation on one variable may be contingent on variation on the other. Let’s take a look at a contingency table. In "Table 8.4" , I have cross-tabulated two questions from my older worker survey: respondents’ reported gender and their self-rated financial security.

Table 8.4 Financial Security Among Men and Women Workers Age 62 and Up
 

Men

Women

Not financially secure (%)

44.1

51.8

Moderately financially secure (%)

48.9

39.2

Financially secure (%)

7.0

9.0

Total

N = 43

N = 135

 

You’ll see in "Table 8.4" that I collapsed a couple of the financial security response categories (recall that there were five categories presented in "Table 8.3"; here there are just three). Researchers sometimes collapse response categories on items such as this in order to make it easier to read results in a table. You’ll also see that I placed the variable “gender” in the table’s columns and “financial security” in its rows. Typically, values that are contingent on other values are placed in rows (a.k.a. dependent variables), while independent variables are placed in columns. This makes comparing across categories of our independent variable pretty simple. Reading across the top row of our table, we can see that around 44% of men in the sample reported that they are not financially secure while almost 52% of women reported the same. In other words, more women than men reported that they are not financially secure. You’ll also see in the table that I reported the total number of respondents for each category of the independent variable in the table’s bottom row. This is also standard practice in a bivariate table, as is including a table heading describing what is presented in the table.

Researchers interested in simultaneously analyzing relationships among more than two variables conduct multivariate analysis. If I hypothesized that financial security declines for women as they age but increases for men as they age, I might consider adding age to the preceding analysis. To do so would require multivariate, rather than bivariate, analysis. We won’t go into detail here about how to conduct multivariate analysis of quantitative survey items here, but we will return to multivariate analysis in "Reading and Understanding Social Research", where we’ll discuss strategies for reading and understanding tables that present multivariate statistics. If you are interested in learning more about the analysis of quantitative survey data, I recommend checking out your campus’s offerings in statistics classes. The quantitative data analysis skills you will gain in a statistics class could serve you quite well should you find yourself seeking employment one day.

KEY TAKEAWAYS

  • While survey researchers should always aim to obtain the highest response rate possible, some recent research argues that high return rates on surveys may be less important than we once thought.
  • There are several computer programs designed to assist survey researchers with analyzing their data which include SPSS, MicroCase, and Excel.
  • Data analysis is about identifying, describing, and explaining patterns.
  • Contingency tables show how, or whether, one variable covaries with another.

EXERCISES

  1. Codebooks can range from relatively simple to quite complex. For an excellent example of a more complex codebook, check out the coding for the General Social Survey (GSS): http://publicdata.norc.org:41000/gss/documents//BOOK/GSS_Codebook.pdf.
  2. The GSS allows researchers to cross-tabulate GSS variables directly from its website. Interested? Check out http://www.norc.uchicago.edu/GSS+Website/Data+Analysis.