TIER Protocol 4.0
This is a Beta version of the 4.0 TIER Protocol. Your feedback is welcome! Email comments to: email@example.com
(view previous version)
The TIER Protocol specifies the contents and organization of reproduction documentation for a project involving computations with statistical data.
Documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.
The tree in the left margin of this page illustrates the default hierarchy of folders, subfolders, and files specified by the TIER Protocol. Clicking on any of the components of the hierarchy takes you to a page with relevant details.
On this page, you can read about:
- Flexibility and Adaptability of the TIER Protocol
- Standards of Reproducibility
- Alternative Workflows: Copy-and-Paste versus Dynamic Documents
Flexibility and Adaptability of the TIER Protocol
Although the TIER Protocol is highly structured and detailed, it is intended to serve as a flexible framework that can be applied to a wide variety of contexts and projects.
The presentation of the TIER Protocol on this website specifies a particular set of files that should be included in the documentation for a project and a particular structure for organizing them, and suggests a number of particular conventions to follow as your work. It might therefore appear to be rigid, overly didactic, and perhaps more complicated than necessary in some situations.
In fact, however, the Protocol is intended to serve as a highly flexible framework that can be adapted to a wide variety of contexts and projects.
The Protocol is written with a high degree of detail simply for the sake of clarity. Particularly as students are learning to construct reproduction documentation, the specificity of the Protocol helps them understand concretely what they are expected to produce.
As students gain experience and develop an understanding of the fundamental principles and purposes that underlie the Protocol, they should certainly feel free to modify the contents and organization of their documentation in ways that suit their particular needs.
Is the TIER Protocol only for documenting complete research papers?
No. The TIER Protocol describes the documentation you would prepare if you were doing a project that led to a complete research paper.
But for less extensive projects, such as a simple homework assignment or a lab exercise, more limited documentation may be sufficient.
At one end of the spectrum, suppose the project is a simple homework assignment in which the instructor gives you a data file, and asks you to conduct one simple statistical test, and then answer a few questions about the results of the test. In this case, the report would just consist of a document in which you answer the questions. The documentation you prepare with the report could be a simple as one Project Folder containing a .pdf version of your answers to the questions, a copy of the data file, and a script with commands that open the data file and execute the test. If the instructor wanted you to produce somewhat more robust documentation, they might also ask you to turn in a log file that captures the output generated when you run the script.
For more involved homework problems and exercises, you will need to construct additional components of documentation to ensure that your computations are reproducible.
In general, whatever the nature of the project, you should follow the guidelines of the Protocol where they make sense--in particular, where they help you achieve the goal of reproducibility--but omit any components that are not useful for your purposes.
Do I have to use a specific kind of software?
No. You can conduct a quantitative project and create excellent reproduction documentation with many kinds of software.
The one requirement is that you use a type of software for which you can write scripts.
Scripts are documents containing commands written in the syntax of whatever software you are using. When you run a script, the software reads and executes the commands in the order in which they appear. Scripts are fundamental to reproducibility: your reproduce your work by running the scripts you wrote.
It is possible to write scripts for almost every kind of statistical software. Some prominent examples include Matlab, R, SAS, SPSS, and Stata, and there are many others. Any of these can be used to conduct a research project and create reproduction documentation following the TIER Protocol.
The most prominent example of a statistical package for which it is not possible to write scripts is Microsoft Excel. Since you can not write scripts for Excel, the TIER Protocol cannot be applied to document projects you do with Excel.
Standards of Reproducibility
Generally speaking, documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.
More specifically, the specifications TIER Protocol was designed to achieve several particular standards of reproducibility:
● Sufficiency: The documentation for a project should contain everything necessary to reproduce the results.
The documentation for a project should contain all the data, scripts, and supplementary information necessary to enable any user (with access to and basic proficiency in the software used) to reproduce all the results presented in the report without undue difficulty.
The user should not have to seek assistance from the person who conducted the study.
● Soup-to-nuts: The documentation should include scripts that reproduce all the computations required for both (i) the steps of processing necessary to prepare the data for analysis, and (ii) the analysis or procedures performed with the processed data that generate the results.
More generally, the soup-to-nuts standard implies that the researcher must not conduct any part of the data processing or analysis "by hand"--e.g., using drop-down menus or interactively typing commands in a GUI, or using a non-scripted tool (such as Microsoft Excel). Every computation that modifies the data or produces a result must be executed by a command written in one of the scripts.
Our use of the term soup-to-nuts is based on the expression "from soup to nuts", which means "[f]rom the very beginning to the very end" [source: from soup to nuts. (n.d.) Farlex Dictionary of Idioms. (2015). Retrieved July 14 2021 from https://idioms.thefreedictionary.com/from+soup+to+nuts].
● Portability: Any user should be able to run the scripts on their own computer or workspace.
Provided the necessary software is installed, any user should be able to copy the Project/ folder, with the hierarchy of subfolders and files it contains intact, onto their own computer or workspace, and then run the scripts that reproduce the project locally.
Three practices are key to ensuring portability of your documentation:
- Using relative directory paths. Portability implies that file locations specified in scripts must be expressed in terms of relative directory paths. (If scripts contain absolute directory paths starting at the root directory of a particular computer, they will not run on any other computer.)
- Choosing the folder where relative directory paths begin. The relative directory paths must begin either in the Project/ folder or one of its subfolders. A simple convention is to write all relative directory paths starting from the Project/ folder.
- Designating the folder where the relative directory path begins as the working directory. Whenever a script is executed, whatever folder was chosen as the starting place for relative directory paths (either the Project/ folder or one of its subfolders) must be designated as the working directory for the software that runs the script.
● (Almost) one-click reproducibility: Reproducing all the computations
required to generate the results of a study should require (almost) nothing other than executing a single Master Script.
We say "almost" because there is one preliminary step the user must take before running the master script, namely setting the working directory to the designated folder.
Although reproducing a project usually entails running multiple scripts, the documentation should also include one Master Script that calls the other scripts in the correct order.
The documentation should also clearly indicate which folder (the Project/ folder or one of its subfolders) is the one where relative directory paths specified in the scripts begin.
Given these prerequisites, once a user has copied the Project/ folder onto their computer or workspace, it should be possible to execute the complete reproduction by:
- Launching the software, and setting the working directory to the folder where relative directory paths specified in the scripts begin,
- Running the Master Script.
The one click that runs the Master Script is then (almost) all the user needs to do to reproduce the project.
Just one preliminary step is required, namely setting the working directory.
Alternative Workflows: Copy-and-Paste versus Dynamic Documents
The TIER Protocol applies to projects that are conducted with a copy-and-paste workflow, in which results are saved in output files as they are produced, and then copied and pasted into the script of the report.
The particulars of the TIER Protocol do not apply to a dynamic documents workflow, in which both the text of the paper and commands for the computations that generate the results are combined in a single file written in a markup language.
Nonetheless, all four dimensions of reproducibility the TIER Protocol is intended to ensure--completeness, soup-to-nuts, portability, and (almost) one-click reproducibility--can be achieved with a dynamic documents workflow. For an example showing this is true, see this demo project conducted with R and R Markdown. (Link to demo coming soon.)
When a project is conducted with a copy-and-paste workflow, documentation that meets the specification of the TIER Protocol ensures that the results presented in the report are reproducible. Running the scripts not only executes the data processing and analysis that produce the results, but also saves the results in an output folder.
However, the manuscript of the report is not reproducible. The author composes it using their choice of word-processing or type-setting software, then copies the results saved in the output folder and pastes them into the report at the appropriate points. This copying and pasting is done interactively, and no script that reproduces this process is created.
With the dynamic documents workflow, both the results and the entire manuscript of the report are reproducible.
The dynamic documents workflow entails writing a markup script that includes the text of the report, with commands that produce the results interspersed throughout. When the script is sent to a compiler, it is rendered as a formatted report, with the results produced by the interspersed commands inserted at the appropriate points.
Running the markup script then reproduces both the results and the entire report in which they appear.
Two popular types of software for creating dynamic documents are R with R Markdown, and Stata with the -dyndoc- function.