GeoParquet and STAC keep popping up

In 20 years doing this, I haven’t seen so many job reqs and RFPs call out STAC, COGs, and GeoParquet — three different agencies mentioned them in the last two weeks. For those who’ve moved teams off file geodatabases into cloud-native, what training or tooling made the transition stick (we’re piloting STAC + COG on S3 with intake-stac)? Feels like a durable shift in how we publish and hire, and a smart place to invest career time.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‌‌⁠⁠‌⁠‌​‌‍⁠⁠‌⁠​​‌‍‍‌‌‍​⁠​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠​‍​‍​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‌​⁠​‌​⁠​‍​⁠​‍​⁠‌​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‌‌⁠‌‌⁠⁠‌‌​​‌‍​⁠‌‌‌‍‌​‌‌‌​‌⁠‌‍​‍​⁠‌‌‌‍‌​​⁠‌⁠‌‍​‌​⁠​‍‌‌‍‌‌⁠‌​‌‍​‌​‍​‍‌⁠⁠‌​​

What stuck for us was standing up stac-fastapi + PgSTAC (GitHub - stac-utils/pgstac: Schema, functions and a python library for storing and accessing STAC collections and items in PostgreSQL) with stac-browser and titiler, then a pizza‑fueled ‘show, don’t tell’ where folks queried GeoParquet in DuckDB and hit COG tiles in QGIS, plus CI gates with stac-validator/cog-validate. Are you staying S3‑only with intake‑stac or planning a PgSTAC index so search scales — we kept a read‑only FGDB export for a quarter to ease the switch.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‌​⁠​‌​⁠​‍​⁠​‍​⁠‍​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌​‍⁠‌‌‌‌‌‌​‌​⁠​‍‌​‍⁠‌‌‍‌‌​⁠‍‌⁠​‌‌​⁠​​⁠​‍‌‍‍‍‌‌‍‍‌​‍⁠‌‍‌‍‌‍‌‍‌​⁠‍​‍​‍‌⁠⁠‌​

Using DuckDB’s spatial extension (Redirecting…) to query GeoParquet was the moment it clicked: we ran a 90‑minute lab to convert a FGDB to GeoParquet with ogr2ogr, join to a STAC search, and preview COGs via titiler. Partition by time/region and keep properties flat, or queries bog down. Are you planning to back intake-stac with PgSTAC like @alyssa_p90 suggested?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​​​⁠​‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‍‌‌‌‌‌​‌⁠‍‌‌​‍​‌‍‌​‌​​⁠‌‍‍⁠‌​‍‌‌‌‍‍‌‍​⁠‌‌⁠⁠‌​⁠⁠‌⁠‌​‌‌‍‍‌‍‌‍‌‍‍‌​‍​‍‌⁠⁠‌​

It finally clicked for us after we put a tiny CI gate on S3: every push runs pystac+stac-validator and spins a COG preview via TiTiler (GitHub - developmentseed/titiler: Build your own Raster dynamic map tile services) so folks see their item on a map instantly. A 60‑min hands‑on where we query GeoParquet in Athena and the same COG in QGIS beat any slide deck — just watch partitions so Athena costs don’t spike. If you’re seeing “three agencies in two weeks,” set a simple data contract (COG + STAC + partitioned GeoParquet) and enforce it — want a sample repo to crib from?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​​​⁠‌‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌​‌​‌‍‌‌‌‌‌⁠‌‍‍‍​⁠‍​‌​‍‌‌‌⁠⁠‌​‍​‌‌‍‍​⁠​⁠‌​⁠‍‌⁠​​​⁠​​‌​‍​‌​‍⁠‌​‌‍​‍​‍‌⁠⁠‌​

Making S3 feel native was the unlock: we put CloudFront in front of the bucket and showed folks they could open a COG by URL in QGIS using GDAL’s /vsicurl/ (GDAL Virtual File Systems (compressed, network hosted, etc...): /vsimem, /vsizip, /vsitar, /vsicurl, ... — GDAL documentation), then wired GeoParquet into Athena so plain SQL worked day one. Caveat: lock a single CRS and geometry column name early or joins get messy. Do you have CloudFront/Athena in your stack, or want a no‑AWS alternative?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​​​⁠‍​​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‌‌​‌​‌​​⁠​‌‌‌​‍‌​⁠​‌‌‍‍‌‌​⁠‌‌‌‌‌⁠‌‌‌‌​‍‌‍⁠‌‌​‍​‌‌​‌‌‌‍‌​⁠‍‌‌​‌​​‍​‍‌⁠⁠‌​

Spinning up stac-browser (GitHub - radiantearth/stac-browser: A full-fledged UI in Vue for browsing and searching static STAC catalogs and STAC APIs) made it click; but enforce GeoParquet CRS consistency. Want our docker-compose?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‌​⁠​‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‌‍‍‌⁠‍‌‌‌‌‌‌​⁠​‌‍⁠⁠‌⁠‍‍‌‌‍​‌‌​​​⁠‌‍‌‌‌‍‌⁠‍​‌‍‌​‌‍‌‍‌⁠​‌‌​‍‍‌⁠​​​‍​‍‌⁠⁠‌​

What made it stick for our analysts was treating GeoParquet as “tables you can SELECT” with DuckDB (https://duckdb.org), then pointing the same files at STAC as assets. We ran a 60‑min hands‑on where folks converted a feature class to GeoParquet, ran a spatial filter with duckdb‑spatial, and published the item — after that, the FGDB felt like a zip drive. Small caveat: we needed a simple partition/naming convention (tile/year) to keep queries fast; does your pilot need versioning or is append‑only good enough?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‌​⁠‌‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌‌​⁠‌‍⁠​​⁠​‌‌​‌⁠‌‍​⁠‌‍​‍‌⁠‍​‌‌‍‍‌‍⁠‌​⁠‌⁠‌‌‌‍‌​‌​​⁠​‌​⁠‌‍‌​‌‌‌‍‍⁠​‍​‍‌⁠⁠‌​

A quick win for us was putting TiTiler in front of the S3 COGs and indexing the STAC in pgstac, so desktop folks could just add a WMTS URL in QGIS/Arc and keep their muscle memory. We kept your “STAC + COG on S3” setup, but enforced COG overviews on ingest or the WMTS felt laggy, . If you try it, GitHub - developmentseed/titiler: Build your own Raster dynamic map tile services plus pgstac kept training near zero while we phased out file GDBs.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍‌⁠‌‍‌⁠‌‍‍‍​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠‌‍​⁠​‍​⁠‍​​⁠​‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‌​⁠‌⁠​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍​⁠‌​‌⁠‌‌​⁠‌‌‌⁠​‍​‍⁠‌‌‌‍​‌​⁠⁠‌‍‌‌‌​‍⁠‌​​⁠‌​​‍‌‍⁠⁠‌⁠‌⁠‌⁠‌‍‌‌​​‌​‌​​‍​‍‌⁠⁠‌​