2 Repository Structure
3 Recommended Structure
A clean directory structure helps collaborators understand where to find code, data, and outputs.
A typical research repository might look like this:
project/
├── README.md
├── .gitignore
├── data/
│ ├── raw/ # original datasets (not tracked)
│ └── processed/ # cleaned datasets (optional)
├── analysis/
│ ├── scripts/ # analysis scripts
│ ├── notebooks/ # exploratory notebooks
│ ├── models/ # model objects
│ └── output/ # ignored intermediate outputs
├── figures/
│ └── paper/ # final paper figures
├── results/ # ignored, generated tables & intermediate results
├── src/ # reusable functions or packages
├── paper/
│ ├── manuscript.qmd
│ └── references.bib
└── docs/
The important ideas are:
- Raw data stays out of Git. Place original data in
data/raw/and do not track it. Include a README with instructions for obtaining the data. - Processed data may be tracked if it is small, anonymised and needed to reproduce results.
- Analysis scripts and code live in
analysis/scripts/andsrc/. Use numbered scripts (e.g.01_clean.R,02_model.R) for an ordered pipeline. - Intermediate outputs, models and caches go in
analysis/output/orresults/and are ignored via.gitignore. - Manuscript and figures live in
paper/andfigures/. Track final figures but not every intermediate plot.
This structure supports reproducibility and keeps the repository uncluttered.