Contributing to MaQI
Shared infrastructure repo for the Master program at École Polytechnique (lead: Charles-Albert Lehalle). Hosts the provider catalog, Wasabi S3 layout, Colab notebooks, and architecture decisions. No data lives in the repo — everything is on Wasabi S3 (eu-central-1).
Truth pointers
| Concern | Source of truth |
|---|---|
| Repo layout & onboarding | README.md |
| Provider catalog | docs/providers/catalog.yaml |
| Compute vendor landscape | docs/compute/vendors.yaml |
| Compute cost model + egress policy | docs/compute/cost-model.md |
| Wasabi S3 state (sizes, freshness) | docs/wasabi/state.md |
| Data anomalies | docs/wasabi/anomalies.md |
| CAL reference docs (read-only mirror) | docs/cal/ |
| Architecture decisions | docs/adr/INDEX.md |
| Operational workflows | docs/workflows.md |
| Bilingual convention (EN/FR) | docs/language-convention.md |
| Colab access guide | docs/colab-setup.md |
| Issues & anomaly tracker | GitHub Issues |
Commit protocol
Conventional Commits, strict. Allowed types: feat, fix, docs, chore, refactor, test. Allowed scopes: cal, providers, compute, wasabi, adr, notebooks, scripts, colab, gitignore.
Subject style. Imperative present, no period, \(\leq72\) chars. Example: docs(providers): add Babbl fiche with sales contact.
Branch protocol
Slugs are human-readable. Branch names describe the work, not a workflow:
- ✅
feat/databento-fiches,fix/wasabi-state-sync,docs/adr-005-storage-strategy - ❌ Auto-generated names with hex IDs, numeric workflow suffixes, or internal tooling vocabulary
Squash-merge by default for feature branches into main.
Pull request protocol
main is the shared branch read by every contributor. For non-trivial work, use a feature branch and open a PR so CAL / Wissal / Emmanuel can review before merge.
Hooks
Run once after cloning:
./scripts/install-hooks.sh
Installs a pre-push hook that rejects commits with internal-only vocabulary in subjects, bodies, or diffs. Keeps the public surface clean across contributors.
Issue protocol
File anomalies, missing data, and tasks as GitHub Issues. Scope the title (e.g. gdelt:, ravenpack:, infra:) and link the relevant provider fiche or Wasabi bucket state.
What NOT to do
- Do not redistribute raw provider content. Paid feeds (S&P Global, Databento, RavenPack, CausalityLink) are under academic license — store summaries, schemas, validation snippets only.
- Do not enable MFA on your Wasabi user — it blocks S3 API access from Colab. Strong password + key rotation instead.
- Do not push large files (>10 MB) — data goes to Wasabi.
- Do not edit
docs/cal/*.mdby hand — source of truth is CAL's gdrive. Edit there, then re-sync viascripts/sync-cal-docs.sh. - Do not commit credentials (
rclone.conf,.env, API keys). Gitignored; distributed out-of-band.
Contributors
Team roster and roles are in README.md. For contribution context see docs/workflows.md and docs/adr/INDEX.md.