Menu

ProcessingScripts/

In many cases, the Input Data Files you obtain at the beginning of your project will not be formatted and organized in such a way that you can use them for the analysis that generates the results you present in your report.

The changes you make to your Input Data Files to prepare the data for analysis are referred to as the data processing phase of your project. The changes you make to your data during the processing phase may include dropping variables and/or observations, generating new variables, combining data from multiple sources, performing simulations, and a vast array of other operations. In some cases, it may be convenient during data processing to temporarily save one or more Intermediate Data Files, and then use those files again at some later point in the processing.

You will write one or more processing scripts containing commands that execute all the necessary steps of data processing, and then save one or more Analysis Data Files containing the data you will use when you conduct the analysis for your project.

All your processing scripts should be saved in the ProcessingScripts/ folder.

Guidelines for Writing Processing Scripts

When you write your processing scripts, be sure to follow the general guidelines that apply to all scripts.

Additional guidelines that apply particularly to processing scripts are presented below.

Tasks your processing scripts should accomplish

The tasks that should be accomplished by the commands in your processing scripts fall in four categories:

● Opening the Input Data Files

Before any commands that start processing the data in an Input Data File, you must write a command that opens the Input Data File.

● Executing the necessary steps of processing

Commands that modify the Input Data Files in all the ways necessary to create the Analysis Data Files will typically make up the majority of your processing scripts.

● Saving and opening Intermediate Data Files

If processing your data involves any Intermediate Data Files, your processing scripts must include commands that save them after they have been created, and then open them at whatever later point they are used again.

● Saving Analysis Data Files

After the commands that process the data as necessary to prepare it for analysis, you need to write a command or commands that save the processed data in one or more Analysis Data Files. These are the data files you will use when you implement the procedures that generate the results you present in your report.

Don't put commands that analyze your data in the processing scripts

Your processing scripts should only contain commands that modify your Input Data Files as necessary to create your Analysis Data Files.

Your processing scripts should not contain any commands that execute any parts of the data analysis you do for your project: all commands that generate results that you present in your report should be in your analysis scripts.

Naming Your Processing Scripts

If you write all the commands necessary to transform your Input Data Files into your Analysis Data Files in a single processing script, give it the name Processing.yyy.

If you write the commands for processing your data in two or more processing scripts, try to give them meaningful names.

For example, you could choose names that indicate the order in which the scripts should be executed, such as Processing_1.yyy, Processing_2.yyy, Processing_3.yyy.

Or you could give them names that reflect the tasks they accomplish, such as CleaningFedData.yyy, CleaningIMFData.yyy, MergingFedandIMFData.yyy.

NOTE: The extension .yyy represents the filename extension assigned to scripts by the software you are using. (For example: If you are using Stata, .yyy would be replaced by .do; if you are using R, .yyy would be replaced by .R; and if you are using SPSS, .yyy would be replaced by .sps.)