2 Repository Structure

3 Recommended Structure

A clean directory structure helps collaborators understand where to find code, data, and outputs.
A typical research repository might look like this:

project/
├── README.md
├── .gitignore
├── data/
│   ├── raw/         # original datasets (not tracked)
│   └── processed/   # cleaned datasets (optional)
├── analysis/
│   ├── scripts/     # analysis scripts
│   ├── notebooks/   # exploratory notebooks
│   ├── models/      # model objects
│   └── output/      # ignored intermediate outputs
├── figures/
│   └── paper/       # final paper figures
├── results/         # ignored, generated tables & intermediate results
├── src/             # reusable functions or packages
├── paper/
│   ├── manuscript.qmd
│   └── references.bib
└── docs/

The important ideas are:

Raw data stays out of Git. Place original data in data/raw/ and do not track it. Include a README with instructions for obtaining the data.
Processed data may be tracked if it is small, anonymised and needed to reproduce results.
Analysis scripts and code live in analysis/scripts/ and src/. Use numbered scripts (e.g. 01_clean.R, 02_model.R) for an ordered pipeline.
Intermediate outputs, models and caches go in analysis/output/ or results/ and are ignored via .gitignore.
Manuscript and figures live in paper/ and figures/. Track final figures but not every intermediate plot.

This structure supports reproducibility and keeps the repository uncluttered.