For Wissal — pipeline status & Wasabi buckets
One page. Where each pipeline is, what each bucket holds, what is yours this week.
Pipelines
| Pipeline | State | Next move |
|---|---|---|
| Databento — XNAS ITCH | ✅ ingested. 1.43 TiB on maqi-databento, 3 lots, vendor SHA-256 verified per lot. | none — corpus is canonical. |
| S&P Global — Xpressfeed | 🟡 streaming. SFTP sftp2.spglobal.com \(\to\) Wasabi via rclone copy. 10.5 MiB landed, ~3.75 TiB expected. Buckets created. | continue stream ; document Products/ packages once landed. |
| GDELT | ✅ ingested. 47.78 GiB on maqi-gdelt, MD5 + filesizes vendor manifests present. | none. |
| CausalityLink | ✅ ingested as a single dated snapshot (2021-08-13). 186.91 GiB on maqi-causalitylink. No vendor checksum — size-only reconciliation. | none. |
| RavenPack | ✅ ingested. 249.21 GiB on maqi-ravenpack, one zip per year. 2020 missing at source. | none. |
Wasabi buckets — operational state
Snapshot 2026-04-14, region eu-central-1 (Frankfurt). Source:
docs/wasabi/state.md.
| Bucket | Size | Objects | State |
|---|---|---|---|
maqi-databento | 1.430 TiB | 3 042 | full, SHA-256 reconciled |
maqi-spglobal | 10.528 MiB | 6 | preview, stream in flight |
maqi-ravenpack | 249.214 GiB | 14 | full, size-only reconciliation |
maqi-causalitylink | 186.909 GiB | 21 860 | snapshot 2021-08-13, frozen |
maqi-gdelt | 47.781 GiB | 4 658 | full, MD5 reconciled |
Total \(\approx\) 5.7 TiB on six buckets (the test bucket maqi is not counted).
Three next actions Wissal-side
- Document the S&P
Products/packages as they land — schema inspection on each.xffmt.zip(manifest<table>.cnt+ pipe-delimited<table>.txt). Output goes to a per-package fiche underdocs/providers/. - Re-run
scripts/wasabi-state.shonce the S&P stream is complete and commit the diff. The totals table is auto-generated ; do not edit by hand. - Validate use-case scope in scenario-matrix §1 — the 5 UCs are a projection of anticipated usage, not a pedagogical contract. Push back if any UC under-states real demand.
Reading the buckets
The notebook of record is
notebooks/maqi-data-demo.ipynb —
end-to-end examples per bucket (Avro, DBN+Zstandard, ZIP+CSV, ZIP+XFFMT).
Engine matrix: docs/wasabi/state.md.
Anomalies log
Vendor-side defects worth knowing before a student lab opens a notebook:
maqi-gdelt—20221110.export.CSV.zip,20230323.export.CSV.zipmissing ;20230322.export.CSV.zipMD5 divergent.maqi-ravenpack— year 2020 missing at source.maqi-causalitylink— single snapshot frozen 2021-08-13 ; no refreshing.
Detail: docs/wasabi/anomalies.md.
Sources
- Wasabi state:
docs/wasabi/state.md - Sync history:
docs/wasabi/sync-history.md - Anomalies:
docs/wasabi/anomalies.md - Vendor flow diagram:
docs/diagrams/m1-vendor-bucket-flow.md