A codebook gives complete definitions and coding schemes for the variables in a dataset, and any additional information a user would need to understand about the structure and contents of the data.

Your Metadata/ folder should include codebooks for all of your Input Data files.

  • Why should I save copies of the codebooks?

    In most cases, copies of the codebooks for your Input Data Files will be publicly available: you or any interested user can obtain them from the data producer or distributor. Saving copies with the documentation for your project might therefore appear redundant.

    But in fact, saving copies of the codebooks will be very useful. While you are working on your project, there will likely be numerous occasions on which you will have questions about your data: Were the dollar values for a certain variable adjusted for inflation? What were the categories respondents were allowed to choose from when asked to rate their approval of an elected official? What countries do the numerical codes in the data set represent?

    If you have the codebook close at hand--stored in your Metadata/ folder--it will be easy for you to open it up, find the answer to your question, and continue working with confidence that you understand what you are doing.

    If you don't have the codebook stored with your documentation--and so have to find your way to the data distributor's website or wherever the codebook is publicly available--you might hesitate to go to that trouble, and so you might just take a guess at the answer to your question, or somehow sweep it under the rug. But taking that path won't lead anywhere good. When you make guesses or somehow evade your questions rather than really resolving them, you will decrease your understanding of what you are doing and any findings of your study will be less credible.

Codebooks for Existing Input Data Files

If your Input Data Files consist of existing datasets, codebooks should be available from the producer or distributor of the data.

If the codebook for an Input Data File is saved as .pdf or some other format that you can save in your documentation, you should simply store a copy of it in your Metadata/ folder.

If the codebook for an Input Data File is available only in a format that cannot be easily stored as a single document, you should indicate in your Data Sources Guide how a user can access the information. For example, if the information is available only through a searchable online interface, the Data Sources Guide should include a note giving the URL and/or an explanation of how to reach the interface.

Codebooks for Input Data Files You Create Yourself

If you create an Input Data File yourself (e.g., with a survey, experiment, or web scraping), you need to write a codebook for it yourself.

When you write the codebook for an Input Data File that you create yourself, you should follow the guidelines for preparing a Data Appendix.

  • Does that mean the terms "Data Appendix" and "codebook" mean the same thing?


    The Data Appendix and codebooks are similar in that they provide the same kinds of information (variable definitions, coding schemes, descriptive statistics, etc.) about data files.

    The difference is that we use term codebook to refer to a document that provides this information about an Input Data File, and we use the term Data Appendix to refer to a document that provides this information about an Analysis Data File.

    So when you write a codebook for an Input Data File that you created yourself, you provide the same information, and organize it in the same way, as if you were writing a Data Appendix for your Analysis Data Files. But since it contains information about an Input Data File, it is called a codebook.