Up to now, most of Project TIER's attention has been focused on teaching transparent and reproducible methods to students conducting substantial research projects, like a semester-long class project or a senior thesis.
But for a variety of reasons--notably time and resource constraints--instructors are not always able to assign complete research projects to their students.
For instructors who cannot assign complete research papers, but nonetheless want to give their students some training in transparent and reproducible research methods, Project TIER is developing a series of shorter exercises.
The common thread in these exercises is that each of them will guide students through all the major steps of a complete empirical research project, but with enough structure and guidance that they can be completed in a short time--as little as one lab session, or at most about two weeks.
We call these soup-to-nuts exercises because they take students through the entire process of research with statistical data, from the very beginning when they first access the original data, through cleaning and processing the data to prepare them for analysis, to the very end when they generate the results that they present in a written report. Throughout each exercise, there will be an emphasis on adopting a transparent workflow and constructing replication documentation that ensures all the work done for the exercise can be independently reproduced.
The first of these exercises is available for download below. More are in the works, and will be posted as they are completed. The goal is to develop a suite of soup-to-nuts exercises that differ in discipline, subject matter, data sources, and computational methods, but that all introduce students to fundamental principles and practices of transparent research without requiring them to undertake projects that require a semester or more to complete.
Download the first soup-to-nuts exercise:
This exercise uses data from Wechsler (2001; citation below) to compare patterns of alcohol consumption between college students who live in alcohol-free housing and students who do not live in alcohol-free housing.
It is written for Stata-users, but could be easily adapted for any other kind of programmable statistical software.
The analysis conducted in this exercise consists just of constructing and comparing a series of bar graphs, and for the most part only a basic level of proficiency with Stata is required.
For many students, the greatest computational challenges will be:
- Only using Stata (in particular, no Excel) to process the data, and interacting with Stata by writing commands in do-files (rather than typing one command at a time in Stata's command window, or using drop-down menus).
- Paying attention to what folder is designated as Stata's working directory.
- Storing all do-files, data files and associated research documents in a hierarchy of folders and sub-folders with a well-defined, fixed structure.
- Using relative directory paths whenever writing a command in which it is necessary to specify the location of a folder--either to open a file that is stored in it, or to save a new file there.
Understanding how to handle those challenges is a critical part of learning to conduct transparent and reproducible research, and so they are central to the purpose of this exercise.
The exercise is available here. The folder that is downloaded from that link contains:
- A Cover Sheet that gives some background information about the exercise.
- Notes to the Instructor, with a few useful tips to be aware of when assigning this exercise to a class.
- The exercise itself--the instructions students follow.
- A folder containing a "Sample Solution"--complete examples of the report students write for the exercise, along with all the replication documentation they are supposed to construct.
- A Read Me file that explains all the above.
Citation for the data used for this exercise:
Wechsler, Henry. Harvard School of Public Health College Alcohol Study, 2001. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2008-02-05. https://doi.org/10.3886/ICPSR04291.v2
The purpose of this assignment is to introduce students to creating the data appendix and using descriptive statistics to make arguments about the empirical world. The assignment assesses the following learning outcomes:
- Create subsets of data using logical operators.
- Create a frequency table.
- Create variable visualizations using ggplot.
- Use R to calculate central tendency.
- Create a data appendix to provide basic information about variables.
- Use bar charts, histograms, or boxplots to discuss the dispersion of a variable.
- Identify variables' levels of measurement.
This exercise utilizes RStudio and R Markdown to achieve these outcomes.
The .zip file includes:
- instructions for students
- data, scripts, .Rmd files, and an .Rproj file
- sample solution