Menu

ProcessingScripts/

In many cases, the Input Data Files you obtain at the beginning of your project will not be formatted and organized in such a way that you can use them for the analysis that generates the results you present in your report.

The changes you make to your Input Data Files to prepare the data for analysis are referred to as the data processing phase of your project. The changes you make to your data during the processing phase may include dropping variables and/or observations, generating new variables, combining data from multiple sources, performing simulations, and a vast array of other operations. In some cases, it may be convenient during data processing to temporarily save one or more Intermediate Data Files, and then use those files again at some later point in the processing.

You will write one or more processing scripts containing commands that execute all the necessary steps of data processing, and then save one or more Analysis Data Files containing the data you will use when you conduct the analysis for your project.

All your processing scripts should be saved in the ProcessingScripts/ folder.

  • What if my Input Data Files do not need to be processed?

    If your Input Data Files are organized and formatted in such a way that they do not need to be processed in any way before you perform the analysis for your project, you do not need to write any processing scripts.

    If this is the case, you may omit the ProcessingScripts/ folder from your documentation.

Guidelines for Writing Processing Scripts

When you write your processing scripts, be sure to follow the general guidelines that apply to all scripts.

Additional guidelines that apply particularly to processing scripts are presented below.

Tasks your processing scripts should accomplish

The tasks that should be accomplished by the commands in your processing scripts fall in four categories:

● Opening the Input Data Files

Before any commands that start processing the data in an Input Data File, you must write a command that opens the Input Data File.

  • Read more

    When you write a command that opens an Input Data File, you must specify where the file is stored, and you should do so using a relative directory path.

    Assuming that

    • You have adopted the convention of keeping your Project/ folder set as the working directory at all times, and
    • The Input Data File is stored in the InputData/ folder

    the relative directory path should begin in your Project/ folder and lead to the InputData/ folder:

    >Data/InputData/


● Executing the necessary steps of processing

Commands that modify the Input Data Files in all the ways necessary to create the Analysis Data Files will typically make up the majority of your processing scripts.

● Saving and opening Intermediate Data Files

If processing your data involves any Intermediate Data Files, your processing scripts must include commands that save them after they have been created, and then open them at whatever later point they are used again.

  • Read more

    When you write a command that saves an Intermediate Data File after it has been created, or opens an Intermediate Data File for use at a later point in processing, you must specify where the file should be saved or where it should be opened from. And you should do so using a relative directory path.

    Assuming that

    • You have adopted the convention of keeping your Project/ folder set as the working directory at all times, and
    • You save your Intermediate Data Files in the IntermediateData/ folder

    then the relative directory path should begin in your Project/ folder and lead to the IntermediateData/ folder:

    >Data/IntermediateData/

    You will need to specify the relative directory path both when the file is initially saved, and again later when it is opened for further processing.


● Saving Analysis Data Files

After the commands that process the data as necessary to prepare it for analysis, you need to write a command or commands that save the processed data in one or more Analysis Data Files. These are the data files you will use when you implement the procedures that generate the results you present in your report.

  • Read more

    ● In the commands that save your Analysis Data Files, you will need to specify names for the files. Follow these guidelines for naming Analysis Data Files.

    ● When you write a command that saves an Analysis Data File, you must specify where the file should be stored, and you should do so using a relative directory path.

    Assuming you have adopted the convention of keeping your Project/ folder set as the working directory at all times, the relative directory path should begin in your Project/ folder and lead to the AnalysisData/ folder:

    >Data/AnalysisData/


Don't put commands that analyze your data in the processing scripts

Your processing scripts should only contain commands that modify your Input Data Files as necessary to create your Analysis Data Files.

Your processing scripts should not contain any commands that execute any parts of the data analysis you do for your project: all commands that generate results that you present in your report should be in your analysis scripts.

Naming Your Processing Scripts

If you write all the commands necessary to transform your Input Data Files into your Analysis Data Files in a single processing script, give it the name Processing.yyy.

If you write the commands for processing your data in two or more processing scripts, try to give them meaningful names.

For example, you could choose names that indicate the order in which the scripts should be executed, such as Processing_1.yyy, Processing_2.yyy, Processing_3.yyy.

Or you could give them names that reflect the tasks they accomplish, such as CleaningFedData.yyy, CleaningIMFData.yyy, MergingFedandIMFData.yyy.

NOTE: The extension .yyy represents the filename extension assigned to scripts by the software you are using. (For example: If you are using Stata, .yyy would be replaced by .do; if you are using R, .yyy would be replaced by .R; and if you are using SPSS, .yyy would be replaced by .sps.)