Enter, Organize, & Clean Data
The information you collected is likely in a raw format. Some examples of raw data are:
- Completed hardcopy surveys
- Field notes
- Audio recordings of interviews or focus groups
- Video recordings of observations
Typically, raw data is not very useful. Before you can begin any type of analysis, you will need to enter, organize, and clean your data.
Entering and Organizing Data
Whether you have collected quantitative or qualitative data, it is important that you enter the data in a logical format that can be easily understood and analyzed. We will assume you are using a computer since most people use computers to make data management more manageable.
- For quantitative data, use Microsoft Excel or another spreadsheet software package to enter your data into an electronic format.
- For qualitative data, type up all the data into a word processing program such as Microsoft Word.
Before you being entering, develop a system to organize your data.
- For example, if you administered a survey to parents and students, you would probably be best served to create one data spreadsheet for the parent survey and another spreadsheet for the student survey.
- Also make sure that each participant’s responses are assigned to a unique participant ID and responses are organized by survey item/question. Below is an example of what your data spreadsheet may look like.
You will notice that some of the students’ responses are entered as numerical values. This is done so that the computer can read and analyze the data. Develop a code book that lists each variable/question name, all of the answer options for each question, and the numerical code you have assigned to each answer option. The figure below shows the code book entries for the data shown in sample spreadsheet provided above. The code book is your data key/legend, so it is important to keep it in a safe place where you can access it easily and frequently.
You would use a similar organizational approach for qualitative data.
- Create a file for each interview, observation site, focus group, etc.
- Within each file, organize the data by question, time intervals, and/or topic (depending on what method makes the most sense).
**Remember that the process of entering qualitative data can be very tedious and time-consuming, so it is important that you plan accordingly.
**No matter what type of data or method you use, be sure to always go back and review for errors.
Checking data for errors is commonly called “cleaning.” Cleaning data is critical because “dirty” data can severely influence your results.
Three most commonly used in cleaning methods are (United Way of America, 1996):
- Logic checks
The best practice is to use all three approaches so that you are sure you have caught all possible errors.
**Like entering and organizing data, cleaning data can also be time-consuming, so be sure to plan accordingly.
This technique involves comparing the raw data to the electronically entered data to check for data-entry and coding errors.
To spot-check quantitative survey data, you would randomly select several participants’ completed paper surveys and compare them to the data on the electronic spreadsheet.
For qualitative data you would use this approach to check whether participants’ words were transcribed accurately and are attributed to the right individual.
If you do find an error in your first round of spot-checking you should randomly check another round of the raw data. If you continue to find errors, and it is clear that it was not an isolated incident, you will need to go over all of the raw data to ensure that each record was entered correctly.
This technique involves reviewing the data for errors that may have resulted from a data-entry or coding mistake.
For example, question 5 from sample code book above reads: Did you participate in the summer program? Participants can only respond to this question with a “no” or “yes.” “No” is assigned a value of 0, while “yes” responses are assigned a value of 1. Therefore, any number other than a 0 or 1 in the “Q5” column on the sample spreadsheet would be an obvious error.
If you find such errors you will need to go back to the original raw data survey and enter that participant’s answer correctly.
This technique involves a careful review of the electronically entered data to make sure that the answers to the different questions “make sense.”
For example, if participant 001 on the sample spreadsheet indicated that they did not attend the summer program in question 5, it would be illogical for this participant to have provided a satisfaction rating in their response to question 6. The only logical response for this participant would be “99” or “not applicable.”
As with the other types of errors, if you find one you will need to go back to the original raw data for that participant and enter the correct data instead.