TIER Protocol 3.0 | Project TIER | Teaching Integrity in Empirical Research

The Specifications of the TIER Protocol give a complete description of the replication documentation that should be preserved with your study when you have finished the project.

This documentation includes:

The data used for the project.
Command files, containing code written for your software, that clean and prepare the data as necessary, and then execute the procedures that generate the results reported in your study.
Various forms of supporting information to help a user understand and make use of your documentation.

Overview of the Documentation

All the documentation for your study, as well as a copy of the final paper, should be stored in one folder, which we will call the main project folder.

You should give this folder an appropriate name, such as “Economics 203 Research Paper-Group B,” or “J. Smith Senior Thesis,” or “Farm Size and Energy Efficiency.”

The following figure illustrates the files that should be included in the documentation, and how they should be organized in various folders and sub-folders.

TIER-folder-illustration-v3.0.png — Diagram of the Project TIER folder structure. Note: that there could be some variation in this structure depending on the nature of any particular project.

The components of the documentation shown in this illustration above are described in detail below.

The Original Data Folder

The Original Data Folder contains:

Your original data files
Importable data files (if necessary)
A subfolder named Metadata

Original Data Files

The data files you initially obtain for your project and from which you extract the data you use are called original data files. In some cases, all the data used for a study come from a single original data file; in other cases, data are taken from multiple original data files.

A copy of every original data file from which you extract any of the data used in your study should be stored in the Original Data folder.

Your original data files serve as a record of the data you began the project with. The copies you keep in your Original Data folder should therefore be identical to the ones you started with, before you made any changes to them

The work you do for your project will involve manipulating the original data files in many ways: extracting the data they contain, cleaning and processing the data as necessary for your study, and then conducting your analyses. But your Original Data folder should contain copies of your original data files that you did not manipulate or modify in any way.

Importable Data Files

Occasionally, an original data file is in a format that cannot be read by the statistical software you are using for your project. In those cases, you need to create a modified version of the original data file that is in a format your software can read. This modified version of the original data file is called an importable data file.

When you need to create an importable version of an original data file, you should only make the minimal modifications necessary to make it possible for your software to read it. Don't do other cleaning or processing at this point.

When you need to create an importable version of an original data file, you should keep both versions (importable and original) in the Data folder. (The original and importable versions of a data file should be given different names.)

If all of your original data files are in formats that your software can read, you do not need to create any importable data files.

The Metadata Sub-folder

The Metadata sub-folder contains:

A document called your Metadata Guide
Supplementary documents with additional metadata (if necessary)

The Metadata Guide

For each of your original data files, the Metadata Guide provides the kind of information typically found in a codebook accompanying a dataset, such as variable definitions and coding, sampling methods, and anything else a user would need to know to work with and interpret the data appropriately.

You, the author of the paper, compose the Metadata Guide.

The Metadata Guide should be organized into one or more sections; each section should provide information about one of your original data files. For each original data file, the information included in the Metadata Guide should include:

A bibliographic citation for the original data file

This citation should be in a format consistent with the editorial style (e.g., APA or Chicago) used in the main paper or report on the study.

A digital object identifier (DOI) for the data file (if one has been assigned)

If a DOI is included in the bibliographic citation, it need not be repeated.

The date on which the author first downloaded, or obtained in some other way, the original data file

If a DOI is included in the bibliographic citation, it need not be repeated.

A verbal explanation of how an interested reader can obtain a copy of the original data file

In many cases, this explanation will give the URL of a webpage from which the data can be accessed, along with instructions for downloading from that webpage a file identical to the original data file used in the study. In all cases, this explanation should be complete and precise enough to allow an independent researcher to locate and obtain an exact copy of the original data file without any additional information or assistance.

Whatever additional information an independent researcher would need to understand and use the data in the original data file

The particular information required can vary a great deal depending on the nature of the original data file in question. In many cases, the additional information that should be provided about an original data file is often similar to the kind of information found in a codebook or users’ guide for a data set, such as variable names and definitions, coding schemes and units of measurement, details of the sampling method and weight variables, and descriptions of how any imputed variables were constructed. In some cases, it is also necessary to include information about the file structure (e.g., the delimiters used to separate variables, or, in rectangular files without delimiters, the columns in which the variables are stored). Any other unique or idiosyncratic aspects of the data that an independent user of the data would need to understand should be explained as well.

Supplementary documents with additional metadata

In many cases, some or all of the information about an original data file that should be included in the Metadata Guide is available in an existing, publicly accessible document, such as a codebook or user’s guide that is provided with the original data file. In these cases, it is not necessary to include that information in the Metadata Guide. Instead, you may simply put a note in the Metadata Guide indicating that the information is available in an existing document.

When you put a note in the Metadata Guide indicating that certain parts of the information that should be provided there are available in an existing document, you should preserve a copy of the existing document in the Metadata sub-folder (along with the Metadata Guide that you compose yourself).

The Command Files Folder

The Command Files folder contains one or more files containing code written in the syntax of the statistical software you use for the study. The code in these command files should execute all the data processing and analysis necessary to replicate the study and reproduce the reported results.

The best way to construct and organize your command files may vary depending on the nature of the project. In many cases, however, the steps can be grouped into three phases, with one or more command files executing the steps in each phase.

Processing the data

The command files for this phase transform original (and/or importable) data files into the analysis data files, which contain the fully cleaned and processed data that are used to generate the results reported in the paper.

Constructing the Data Appendix

The command files for this phase generate the descriptive statistics, tables, and figures presented in the Data Appendix, a document that serves as a codebook for the analysis data files.

Generating the results

Using the data in the analysis data files, the command files for this stage conduct the procedures that generate the results reported in the paper. Each command that generates any of the results reported in the paper should be preceded by a comment indicating which results (e.g., by Table or Figure number, or page on which the numerical result appears) it produces.

Even if you choose to deviate from this scheme, it may provide a useful framework from which to begin thinking about the most effective way to organize your command files in your particular situation.

The Analysis Data Folder

The fully cleaned and processed data files that you use to generate the results reported in your paper are called analysis data files. They are typically constructed by cleaning, processing and combining data extracted from one or more original data files.

A copy of every analysis data file used for a study should be preserved in the Analysis Data Folder.

It is commonly the case that a single cleaned and processed data file is used to generate all the results reported in a paper, so that just one analysis data file needs to be stored in the Analysis Data folder. But when results are generated from more than one analysis data file, they should all be included.

The Documents Folder

The Documents Folder contains:

A copy of your final paper
Your Data Appendix
Your Read Me file

The Final Paper

A full-text electronic copy of the complete paper or final report on the project.

Saving the electronic version of your paper in .pdf format helps prevent the document from being changed accidentally or corrupted in some other way.

The Data Appendix

The Data Appendix is a document that serves as a codebook for the analysis data files. It is composed by the author of the paper.

For every analysis data file, there is one corresponding section of the Data Appendix. Each section is divided into sub-sections, each of which provides information about one of the variables in the analysis data file.

Some of the information is the same for all variables; other parts of the information depend on whether the variable is quantitative or categorical.

For every variable

The name of the variable and a complete definition (including as appropriate, for example, coding and/or units of measurement, the wording of a survey question the variable is based on, or adjustments made for inflation or PPP).
The name(s) of the original data file from which the variable was extracted, or from which the variables used to construct it were extracted, and the names of the variables extracted from the original data files.
The number of observations with valid values for the variable, and the number of observations with missing values.

For quantitative variables

Basic summary statistics, including the mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum.
A histogram.

For categorical variables

A frequency table.
A bar chart illustrating the frequency distribution.

The Read Me file

The Read Me file is a document that describes the files included in the replication documentation, and explains how they can be used to replicate the study and reproduce the results. You, the author of the study, compose this document.

The Read Me file consists of three main sections:

1: The contents of the replication documentation

This section briefly describes all the files included in the replication documentation, and outlines the structure of the folders in which they are stored.

2: Modifications made to importable data files (if necessary)

As explained above in the description of the Original Data folder, whenever an original data file is in a format that cannot be read by the software you are using, you need to create a second version, called an importable data file, that your software can read.

Section 2 of the Read Me file documents the changes you make to your importable data files. For each original data file you have to modify, you should give a verbal explanation of all the changes you made to create the importable version. That explanation should be written in complete and grammatically correct sentences; it should give the names of both the original and importable versions of the data file that was modified; and it should be precise enough to enable someone else to make the same changes to the original data file and end up with an importable data file identical to the one you created.

If all of your original data files are in formats that your software is able to read, so that it is not necessary to create importable versions of any of them, you may simply omit section 2 from the Read Me file. (In that case, you should call the last section of the Read Me file section 2 instead of section 3.)

3: Instructions for replicating the study

This section gives instructions for using the replication documentation to replicate the data processing and analysis conducted for the study and reproduce the reported results.

These instructions should:

State what kinds of statistical software (including version number and required add-ons) are required to run the command files.
Explain which files included in the replication documentation need to be copied onto the replicator’s computer, the structure of folders and sub-folders in which the files should be copied, and which of the folders each file should be saved in.
Indicate which of the folders should be set as the working directory when the statistical software that executes the command files is run.
Indicate the order in which the command files need to be run to carry out the replication. And, for each command file, indicate what other files it uses (e.g., what data files it opens and what other command files it calls) and what output it produces (e.g., new data files it saves, old data files it deletes, and new directories it creates).