5 Data Management
6 Raw versus Processed
Keep raw data immutable. Place original datasets in data/raw/ and never edit them by hand.
Use scripts to clean and transform data into data/processed/.
Include a README in data/raw/ that explains where to obtain the data, especially if it cannot be shared in the repository.
For example:
data/raw/README.md
This study uses proprietary survey data.
Download the raw CSV from https://osf.io/xxxxx and place it in this directory.
7 Codebooks and Documentation
Maintain a codebook describing variables, recoding decisions, and exclusions.
A simple codebook.md in data/processed/ can document:
- Variable names and descriptions.
- How missing values are handled.
- Which subjects or trials were excluded and why.
- Construction of composite scores.
Clear data documentation is crucial for reproducibility and transparency.