5  Data Management

6 Raw versus Processed

Keep raw data immutable. Place original datasets in data/raw/ and never edit them by hand.
Use scripts to clean and transform data into data/processed/.

Include a README in data/raw/ that explains where to obtain the data, especially if it cannot be shared in the repository.
For example:

data/raw/README.md
This study uses proprietary survey data.
Download the raw CSV from https://osf.io/xxxxx and place it in this directory.

7 Codebooks and Documentation

Maintain a codebook describing variables, recoding decisions, and exclusions.
A simple codebook.md in data/processed/ can document:

  • Variable names and descriptions.
  • How missing values are handled.
  • Which subjects or trials were excluded and why.
  • Construction of composite scores.

Clear data documentation is crucial for reproducibility and transparency.