View as:

Operational workflows

Day-to-day workflows for maintaining MaQI infrastructure. For user-facing onboarding (Colab, local setup, quick data access), see the top-level README.md.

Open a notebook in Colab

  1. Go to https://colab.research.google.com
  2. GitHub tab \(\to\) tick Include private repos (authorize Colab if asked)
  3. Paste the full notebook URL from this repo, e.g. https://github.com/eserie/MaQI/blob/main/notebooks/maqi-data-demo.ipynb
  4. Run the install cell, paste your Wasabi credentials when prompted (getpass \(\to\) never saved).

See colab-setup.md for troubleshooting (MFA, rclone fallback).

Read the data locally

# Install rclone + configure a remote (one-off)
cp rclone.conf.example ~/.config/rclone/rclone.conf
# Edit the file with your Wasabi access key and secret key.

./test-connection.sh    # smoke test
rclone lsd maqi:maqi-gdelt/

From Python:

import polars as pl

storage_options = {
    "endpoint_url": "https://s3.eu-central-1.wasabisys.com",
    "aws_access_key_id": "...",      # never hardcode — read from env
    "aws_secret_access_key": "...",
    "region": "eu-central-1",
}

df = pl.read_csv(
    "s3://maqi-ravenpack/RavenPackEdge_NEWS_COMP_FULL_2024.zip",
    storage_options=storage_options,
)

Update the provider catalog

  1. Edit docs/providers/catalog.yaml (the typed source of truth).
  2. If the change came from a conversation with CAL, also update the corresponding section in docs/cal/datasets-pipeline.md or re-run scripts/sync-cal-docs.sh after CAL has edited his gdrive docx.
  3. Regenerate docs/providers/README.md and docs/providers/cartography.md from the YAML (manual projection for now — keep them consistent).
  4. If a provider changes status, open or close the matching GitHub Issue.

Re-sync CAL's Google Docs

# CAL updated his gdrive doc, we need to pull it into the repo.
./scripts/sync-cal-docs.sh
# Review the diff, merge by hand if vault-specific formatting was preserved.
git add docs/cal/
git commit -m "docs(cal): sync from gdrive YYYY-MM-DD"

Snapshot Wasabi state

./scripts/wasabi-state.sh
# Updates docs/wasabi/state.md with current bucket sizes and object counts.