## Analyze Quantitative Data

Quantitative data analysis is helpful in evaluation because it provides quantifiable and easy to understand results. Quantitative data can be analyzed in a variety of different ways. In this section, you will learn about the most common quantitative analysis procedures that are used in small program evaluation. You will also be provided with a list of helpful resources that will assist you in your own evaluative efforts.

### Quantitative Analysis in Evaluation

Before you begin your analysis, you must identify the level of measurement associated with the quantitative data. The level of measurement can influence the type of analysis you can use. There are four levels of measurement:

- Nominal
- Ordinal
- Interval
- Ratio (scale)

Nominal data – data has no logical; data is basic classification data

- Example: Male or Female
- There is no order associated with male nor female
- Each category is assigned an arbitrary value (male = 0, female = 1)

Ordinal data – data has a logical order, but the differences between values are not constant

- Example: T-shirt size (small, medium, large)
- Example: Military rank (from Private to General)

Interval data – data is continuous and has a logical order, data has standardized differences between values, but no natural zero

- Example: Fahrenheit degrees
- Remember that ratios are meaningless for interval data.
- You cannot say, for example, that one day is twice as hot as another day.

- Example: Items measured on a Likert scale – rank your satisfaction on scale of 1-5.
- 1 = Very Dissatisfied
- 2 = Dissatisfied
- 3 = Neutral
- 4 = Satisfied
- 5 = Very satisfied

Ratio data – data is continuous, ordered, has standardized differences between values, and a natural zero

- Example: height, weight, age, length
- Having an absolute zero enables you to meaningful say that one measure is twice as long as another.
- For example – 10 inches is twice as long as 5 inches
- This ratio hold true regardless of which scale the object is being measured in (e.g. meters or yards).

Once you have identified your levels of measurement, you can begin using some of the quantitative data analysis procedures outlined below. Due to sample size restrictions, the types of quantitative methods at your disposal are limited. However, there are several procedures you can use to determine what narrative your data is telling. Below you will learn how about:

- Data tabulation (frequency distributions & percent distributions)
- Descriptives data
- Data disaggregation
- Moderate and advanced analytical methods

To demonstrate each procedure we will use the example summer program student survey data presented in “Enter, Organize, & Clean Data” section.

**The first thing you should do with your data is tabulate your results for the different variables in your data set**. This process will give you a comprehensive picture of what your data looks like and assist you in identifying patterns. The best ways to do this are by constructing frequency and percent distributions

*A frequency distribution* is an organized tabulation of the number of individuals or scores located in each category (see the table below).

- This will help you determine:
- If scores are entered correctly
- If scores are high or low
- How many are in each category
- The spread of the scores

From the table, you can see that 15 of the students surveyed who participated in the summer program reported being satisfied with the experience.

*A percent distribution*** **displays the proportion of participants who are represented within each category (see below). From the table, you can see that

*75%*of students (n = 20) surveyed who participated in the summer program reported being satisfied with the experience.

A descriptive refers to calculations that are used to “describe” the data set. The most common descriptives used are:

- Mean – the numerical average of scores for a particular variable
- Minimum and maximum values – the highest and lowest value for a particular variable
- Median – the numerical middle point or score that cuts the distribution in half for a particular variable
- Calculate by:
- Listing the scores in order and counting the number of scores
- If the number of scores is odd, the median is the number that splits the distribution
- If the number of scores is even, calculate the mean of the middle two scores

- Calculate by:
- Mode – the most common number score or value for a particular variable

Depending on the level of measurement, you may not be able to run descriptives for all variables in your dataset.

- A meaningful mean can only be calculated from interval and ratio data
- Minimum and maximum values can be calculated for all levels of measurement
- A meaningful median can only be calculated from ordinal, interval, and ratio data
- The mode can be calculated for all levels of measurement

In the table, you can see that the average satisfaction level of the students surveyed who participated in the summer program (n =20) was 2.8, with a range of 1 = Very Dissatisfied to 4 = Very Satisfied. The mode (most commonly occurring value) is 3, a report of satisfaction.

Using data from our example, let’s explore the participant demographics (gender and ethnicity) within each program city.* *By looking at the table below, you can clearly see that the demographic makeup of each program city is different.

From the table above, you can see that:

- Females are overrepresented in the New York program, and males are overrepresented in the Boston program
- Over 70% of the White sample is in the Boston program while only 14% of the Black sample is represented in that program
- Asian and Latino/a participants are evenly distributed across both program cities
- The entire Native American sample (n=2) is the Boston program

You can also disaggregate the data by subcategories within a variable. This allows you to take a deeper look at the units that make up that category. From our sample data, 25% of students reported being dissatisfied with the summer program experience. In the table below, we explore this subcategory of participants more in-depth.

From this table, you can see that:

- All of the students who were dissatisfied with the program were students of color.
- All but one of the students of color in the Boston program were dissatisfied with their experience since there were 6 students of color in the Boston program.

From these results it may be inferred that the Boston program is not meeting the needs of its students of color. This result is masked when you report the average satisfaction level of all participants in the program is 2.8 (on a 4-point scale) and that 75% of the students sampled were satisfied with their experience.

In addition to the basic methods described above there are a variety of more complicated analytical procedures that you can perform with your data. These include:

- Correlation
- Regression
- Analysis of variance

These types of analyses generally require computer software (e.g., SPSS, SAS, STATA, MINITAB) and a solid understanding of statistics to interpret the results. We provide basic descriptions of each method but encourage you to seek additional information (e.g., StatSoft Electronic Statistics Textbook and Hyperstat Online Statistics Textbook) and training before using these procedures below.

For more information on quantitative data analysis, see the following sources: http://archive.gao.gov/t2pbat6/146957.pdf; http://learningstore.uwex.edu/assets/pdfs/G3658-6.pdf.

### Correlation

**A correlation is a statistical calculation which describes the nature of the relationship between two variables (i.e., strong and negative, weak and positive, statistically significant).**

An important thing to remember when using correlations is that a correlation does not explain causation. A correlation merely indicates that a relationship or pattern exists, but it does not mean that one variable is the cause of the other.

For example, you might see a strong positive correlation between participation in the summer program and students’ grades the following school year; however, the correlation will not tell you if the summer program is the reason why students’ grades were higher.

### Analysis of Variance

**An ****analysis of variance**** (ANOVA) is used to determine whether the difference in means (averages) for two groups is statistically significant.**

For example, an analysis of variance will help you determine if the high school grades of those students who participated in the summer program are significantly different from the grades of students who did not participate in the program.

### Regression

**Regression is an extension of correlation and is used to determine whether one variable is a predictor of another variable.**

A regression can be used to determine how strong the relationship is between your intervention and your outcome variables. More importantly, a regression will tell you whether a variable (e.g., participation in your program) is a statistically significant predictor of the outcome variable (e.g., GPA, SAT, etc.). A variable can have a positive or negative influence, and the strength of the effect can be weak or strong.

For example, a regression would help you determine if the length of participation (number of weeks) in the summer program is actually predictor of students’ high school grades the following year. Like correlations, causation can not be inferred from regression.