Menu

The Data Appendix

The Data Appendix serves as a codebook for your Analysis Data Files.

It gives a complete definition and/or coding scheme, basic summary statistics, and a visualization of the distribution of every variable in your Analysis Data Files.

You should write the Data Appendix as soon as you have constructed your Analysis Data Files.

You should not begin analyzing your data until you have written the Data Appendix.

What's the Difference Between the Data Appendix and the Codebooks?

The Data Appendix and codebooks are similar in that they provide the same kinds of information (variable definitions, coding schemes, descriptive statistics, etc.) about data files.

The main difference is that we use term codebook to refer to a document that provides this information about an Input Data File; we use the term Data Appendix to refer to a document that provides this information about the Analysis Data Files.

Contents of the Data Appendix

The Data Appendix should be organized in sections, with one section for each Analysis Data File.

The section for each Analysis Data File should be be organized in subsections, with one subsection for each variable in the Analysis Data File.

Information about the Analysis Data Files

The section for each Analysis Data File should begin with a statement of what the unit of observation is--that is, it should explain what kind of object each row of the data file represents.

  • For example

    For example:

    • If a data file contains two variables, inflation2019 (rate of inflation in 2019) and unemployment2019 (fraction of the labor force without employment in 2019), and each row in the data file represents a particular country, the unit of observation is "country".
    • If a data file contains two variables, inflationMEX (rate of inflation in Mexico) and unemploymentMEX (fraction of the labor force without employment in Mexico), and each row in the data file represents a particular year, the unit of observation is "year".
    • If a data file contains two variables, inflation (rate of inflation) and unemployment (fraction of the labor force without employment), and each row in the data file represents a particular country in a particular year, the unit of observation is "country-year".
    • If each row of a data set represents the answers given by a single individual to a set of survey questions, the unit of observation is "survey respondent".

Information about the variables in the Analysis Data Files

In the subsection for each variable in an Analysis Data File, parts of the information provided are the same for all variables; other parts of the information depend on whether the variable is quantitative or categorical.

  • Information provided about every variable

    For every variable in your Analysis Data Files (whether quantitative or categorical), the Data Appendix should provide the following information:

    • The name of the variable and a complete definition, including details such as units of measurement or the the exact wording of a survey question the variable was based on.
    • The names of the variable or variables in the Input Data Files that were used to construct the variable, and an explanation of the steps of processing by which the variable was constructed from the variable(s) in the Input Data Files.
    • The number of missing observations for the variable and the total number of observations. These numbers should be reported in the form n(m), where n is the total number of observations in the Analysis Data File, and m is the number of observations for which the value is missing.
  • Additional information for quantitative variables
    • Basic summary statistics, including the mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum.
    • A histogram.
  • Additional information for categorical variables
    • A frequency table.
    • A bar chart illustrating the frequency distribution.

Writing the Data Appendix

The Data Appendix will include text that you type, as well as tables, figures, and other descriptive statistics that are generated by your Data Appendix Script.

You may use any word processing or typesetting software you like (eg., Microsoft Word, Google Docs, or LaTex) to write the text of the Data Appendix.

The commands in the Data Appendix Script that create the tables, figures, and descriptive statistics for your Data Appendix also store them in files that are saved in the DataAppendixOutput/ folder. When you adopt a copy-and-paste workflow, you copy output from these files and paste it into the Data Appendix at the appropriate points.

The copy of the Data Appendix you save in your AnalysisData/ folder should be in .pdf format.

Naming the Data Appendix

Give your Data Appendix the name DataAppendix.pdf.