Menu

The Data Appendix

The Data Appendix serves as a codebook for your Analysis Data Files.

It gives a complete definition and/or coding scheme, basic summary statistics, and a visualization of the distribution of every variable in your Analysis Data Files.

You should write the Data Appendix as soon as you have constructed your Analysis Data Files.

You should not begin analyzing your data until you have written the Data Appendix.

What's the Difference Between the Data Appendix and the Codebooks?

The Data Appendix and codebooks are similar in that they provide the same kinds of information (variable definitions, coding schemes, descriptive statistics, etc.) about data files.

The main difference is that we use term codebook to refer to a document that provides this information about an Input Data File; we use the term Data Appendix to refer to a document that provides this information about the Analysis Data Files.

Contents of the Data Appendix

The Data Appendix should be organized in sections, with one section for each Analysis Data File.

The section for each Analysis Data File should be be organized in subsections, with one subsection for each variable in the Analysis Data File.

Information about the Analysis Data Files

The section for each Analysis Data File should begin with a statement of what the unit of observation is--that is, it should explain what kind of object each row of the data file represents.

  • For example

    For example:

    • If a data file contains two variables, inflation2019 (rate of inflation in 2019) and unemployment2019 (fraction of the labor force without employment in 2019), and each row in the data file represents a particular country, the unit of observation is "country".
    • If a data file contains two variables, inflationMEX (rate of inflation in Mexico) and unemploymentMEX (fraction of the labor force without employment in Mexico), and each row in the data file represents a particular year, the unit of observation is "year".
    • If a data file contains two variables, inflation (rate of inflation) and unemployment (fraction of the labor force without employment), and each row in the data file represents a particular country in a particular year, the unit of observation is "country-year".
    • If each row of a data set represents the answers given by a single individual to a set of survey questions, the unit of observation is "survey respondent".

Information about the variables in the Analysis Data Files

In the subsection for each variable in an Analysis Data File, parts of the information provided are the same for all variables; other parts of the information depend on whether the variable is quantitative or categorical.

  • Information provided about every variable

    For every variable in your Analysis Data Files (whether quantitative or categorical), the Data Appendix should provide the following information:

    • The name of the variable and a complete definition, including details such as units of measurement or the the exact wording of a survey question the variable was based on.
    • The names of the variable or variables in the Input Data Files that were used to construct the variable, and an explanation of the steps of processing by which the variable was constructed from the variable(s) in the Input Data Files.
    • The number of missing observations for the variable and the total number of observations. These numbers should be reported in the form m:n, where m is the number of observations in the Analysis Data File for which the value of the variable is missing, and n is the total number of observations in the Analysis Data File.
  • Additional information for quantitative variables
    • Basic summary statistics, including the mean, standard deviation, minimum, 25th percentile, median, 75th percentile, and maximum.
    • A histogram.
  • Additional information for categorical variables
    • A frequency table.
    • A bar chart illustrating the frequency distribution.

Writing the Data Appendix

You may use any word processing or typesetting software you like (eg., Microsoft Word, Google Docs, or LaTex) to write the Data Appendix.

Much of the information you present in the Data Appendix will be stored in files that are generated when you run your Data Appendix Scripts and saved in the DataAppendixOutput/ folder. When you adopt a copy-and-paste workflow, you will copy output from these files and paste it into the appropriate places in your Data Appendix.

The copy of the Data Appendix you save in your AnalysisData/ folder should be in .pdf format.

Naming the Data Appendix

Give your Data Appendix the name DataAppendix.pdf.