Operational workflows
Day-to-day workflows for maintaining MaQI infrastructure. For user-facing
onboarding (Colab, local setup, quick data access), see the top-level
README.md.
Open a notebook in Colab
- Go to https://colab.research.google.com
- GitHub tab \(\to\) tick Include private repos (authorize Colab if asked)
- Paste the full notebook URL from this repo, e.g.
https://github.com/eserie/MaQI/blob/main/notebooks/maqi-data-demo.ipynb - Run the install cell, paste your Wasabi credentials when prompted
(
getpass\(\to\) never saved).
See colab-setup.md for troubleshooting (MFA, rclone fallback).
Read the data locally
# Install rclone + configure a remote (one-off)
cp rclone.conf.example ~/.config/rclone/rclone.conf
# Edit the file with your Wasabi access key and secret key.
./test-connection.sh # smoke test
rclone lsd maqi:maqi-gdelt/
From Python:
import polars as pl
storage_options = {
"endpoint_url": "https://s3.eu-central-1.wasabisys.com",
"aws_access_key_id": "...", # never hardcode — read from env
"aws_secret_access_key": "...",
"region": "eu-central-1",
}
df = pl.read_csv(
"s3://maqi-ravenpack/RavenPackEdge_NEWS_COMP_FULL_2024.zip",
storage_options=storage_options,
)
Update the provider catalog
- Edit
docs/providers/catalog.yaml(the typed source of truth). - If the change came from a conversation with CAL, also update the
corresponding section in
docs/cal/datasets-pipeline.mdor re-runscripts/sync-cal-docs.shafter CAL has edited his gdrive docx. - Regenerate
docs/providers/README.mdanddocs/providers/cartography.mdfrom the YAML (manual projection for now — keep them consistent). - If a provider changes
status, open or close the matching GitHub Issue.
Re-sync CAL's Google Docs
# CAL updated his gdrive doc, we need to pull it into the repo.
./scripts/sync-cal-docs.sh
# Review the diff, merge by hand if vault-specific formatting was preserved.
git add docs/cal/
git commit -m "docs(cal): sync from gdrive YYYY-MM-DD"
Snapshot Wasabi state
./scripts/wasabi-state.sh
# Updates docs/wasabi/state.md with current bucket sizes and object counts.