Package Development – Data Science
No Clocks
Integration approaches and methods | ArcGIS Architecture Center
ArcGIS Well-Architected.
Map Unit Key Grids and Thematic Maps of Soil Survey Geographic (SSURGO) Data
Life Altering Postgresql Patterns
Use UUID primary keys
UUIDs have downsides
Truly random UUIDs doesn't sort well (and this has implications for indexes)
They take up more space than sequential ids (space being your cheapest resource)
But I've found those to be far outweighed by the upsides
You don't need to coordinate with the database to produce one.
They are safe to share externally.
Give everything created_at and updated_at
It's not a full history, but knowing when a record was created or last changed is a useful breadcrumb when debugging. Its also something you can't retroactively get unless you were recording it.
So just always slap a created_at and updated_at on your tables. You can maintain updated_at automatically with a trigger.
You need to create the trigger for each table, but you only need to create the function once.
on update restrict on delete restrict
When you make a foreign key constraint on a table, always mark it with on update restrict on delete restrict.
This makes it so that if you try and delete the referenced row you will get an error. Storage is cheap, recovering data is a nightmare. Better to error than do something like cascade.
Use schemas
By default, every table in Postgres will go into the "public" schema. This is fine, but you are missing out if you don't take advantage of your ability to make new schemas.
Schemas work as namespaces for tables and for any moderate to large app you are going to have a lot of tables. You can do joins and have relationships between tables in different schemas so there isn't much of a downside.
Enum Tables
There are a lot of ways to make "enums" in sql. One is to use the actual "enum types," another is to use a check constraint.
The pattern introduced to me by Hasura was enum tables.
Have a table with some text value as a primary key and make columns in other tables reference it with a foreign key.
This way you can insert into a table to add more allowed values or attach metadata like a comment to explain what each value means.
Mechanically name join tables
Sometimes there are sensible names to give "join tables" - tables which form the basis for "many to many" relationships between data - but often there isn't. In those cases don't hesitate to just concatenate the names of the tables you are joining between.
Represent statuses as a log
It is very tempting to represent the status of something as a single column. You submit some paperwork and it has a status of submitted. Someone starts to look at it then it transitions to in_review. From there maybe its rejected or approved.
There are two problems with this
You might actually care about when it was approved, or by whom.
You might receive this information out-of-order.
Webhooks are a prime example of the 2nd situation. There's no way in the laws of physics to be sure you'll get events in exactly the right order.
To handle this you should have a table where each row represents the status of the thing at a given point in time. Instead of overloading created_at or updated_at for this, have an explicit valid_at which says when that information is valid for.
Just having an index on valid_at can work for a while, but eventually your queries will get too slow. There are a lot of ways to handle this, but the one we've found that works the best is to have an explicit latest column with a cheeky unique index and trigger to make sure that only the row with the newest valid_at is the latest one.
Mark special rows with a system_id
It's not uncommon to end up with "special rows." By this I mean rows in a table that the rest of your system will rely on the presence of to build up behavior.
All rows in an enum table are like this, but you will also end up with rows in tables of otherwise normal "generated during the course of normal use" rows. For these, give them a special system_id.
Use views sparingly
Views are amazing and terrible.
They are amazing in their ability to wrap up a relatively complex or error-prone query into something that looks basically like a table.
They are terrible in that removing obsolete columns requires a drop and recreation, which can become a nightmare when you build views on views. The query planner also seems to have trouble seeing through them in general.
So do use views, but only as many as you need and be very wary of building views on views
JSON Queries
You might have heard that Postgres "supports JSON." This is true, but I had mostly heard it in the context of storing and querying JSON. If you want a table with some blob of info slap a jsonb column on one your tables.
That is neat, but I've gotten way more mileage out of using JSON as the result of a query. This has definite downsides like losing type information, needing to realize your results all at once, and the overhead of writing into json.
But the giant upside is that you can get all the information you want from the database in one trip, no cartesian product nightmares or N+1 problems in sight.
Import rasters file to PostGIS database using raster2pgsql - Spatial Dev Guru
aster2pgsql is GIS command line utility which is installed as a part of PostGIS. To install raster2pgsql, you have to install PostGIS and raster2pgsql will be installed with it.
Using 'gdal' CLI algorithms from R
R Bindings to GDAL
API bindings to the Geospatial Data Abstraction Library (GDAL, ). Implements the GDAL Raster and Vector Data Models. Bindings are implemented with Rcpp modules. Exposed C++ classes and stand-alone functions wrap much of the GDAL API and provide additional functionality. Calling signatures resemble the native C, C++ and Python APIs provided by the GDAL project. Class GDALRaster encapsulates a GDALDataset and its raster band objects. Class GDALVector encapsulates an OGRLayer and the GDALDataset that contains it. Initial bindings are provided to the unified gdal command line interface added in GDAL 3.11. C++ stand-alone functions provide bindings to most GDAL "traditional" raster and vector utilities, including OGR facilities for vector geoprocessing, several algorithms, as well as the Geometry API (GEOS via GDAL headers), the Spatial Reference Systems API, and methods for coordinate transformation. Bindings to the Virtual Systems Interface (VSI) API implement standard file system operations abstracted for URLs, cloud storage services, Zip/GZip/7z/RAR, in-memory files, as well as regular local file systems. This provides a single interface for operating on file system objects that works the same for any storage backend. A custom raster calculator evaluates a user-defined R expression on a layer or stack of layers, with pixel x/y available as variables in the expression. Raster combine() identifies and counts unique pixel combinations across multiple input layers, with optional raster output of the pixel-level combination IDs. Basic plotting capability is provided for raster and vector display. gdalraster leans toward minimalism and the use of simple, lightweight objects for holding raw data. Currently, only minimal S3 class interfaces have been implemented for selected R objects that contain spatial data. gdalraster may be useful in applications that need scalable, low-level I/O, or prefer a direct GDAL API.
TC's GIS and Geography Blog
Viewing the world through geography and GIS.
VRT -- GDAL Virtual Format — GDAL documentation
The VRT driver is a format driver for GDAL that allows a virtual GDAL dataset to be composed from other GDAL datasets with repositioning, and algorithms potentially applied as well as various kinds of metadata altered or added. VRT descriptions of datasets can be saved in an XML format normally given the extension .vrt.
gdalbuildvrt — GDAL documentation
gdalbuildvrt [--help] [--long-usage] [--help-general] [--quiet] [[-strict]|[-non_strict]] [-tile_index <field_name>] [-resolution user|average|common|highest|lowest|same] [-tr <xres> <yes>] [-input_file_list <filename>] [[-separate]|[-pixel-function <function>]] [-pixel-function-arg <NAME>=<VALUE>]... [-allow_projection_difference] [-sd <n>] [-tap] [-te <xmin> <ymin> <xmax> <ymax>] [-addalpha] [-b <band>]... [-hidenodata] [-overwrite] [-srcnodata "<value>[ <value>]..."] [-vrtnodata "<value>[ <value>]..."] [-a_srs <srs_def>] [-r nearest|bilinear|cubic|cubicspline|lanczos|average|mode] [-oo <NAME>=<VALUE>]... [-co <NAME>=<VALUE>]... [-ignore_srcmaskband] [-nodata_max_mask_threshold <threshold>] <vrt_dataset_name> [<src_dataset_name>]...
This program builds a VRT (Virtual Dataset) that is a mosaic of a list of input GDAL datasets. The list of input GDAL datasets can be specified at the end of the command line, put in a text file (one filename per line) for very long lists, or it can be a MapServer tileindex (see the gdaltindex utility). If using a tile index, all entries in the tile index will be added to the VRT.
Before I Sleep: Push the limits of interactive mapping in R with vector tiles
A tutorial on mapping with vector tiles in R
JSON – Data Science
Performance Optimization for Plumber APIs: Serialization – Joe Kirincic
A series of posts about making your Plumber APIs production ready.
Tidyverse style guide
Tidy design principles
OpenResty® - Open source
YeSQL specifications — YeSQL documentation
Cursor agent best practices
A comprehensive guide to working with coding agents, from starting with plans to managing context, customizing workflows, and reviewing code.
Creating new generative art tools in R with grid, ambient, and S7 – Notes from a data witch
There might be a darker undercurrent in this one
Build – Data Science
Data Science
Background Workers in Azure Container Apps with KEDA · Thorsten Hans
This article demonstrates how to build a scalable background worker for processing messages from Azure Service Bus and host it in Azure Container Apps
GrapesJS - Free and Open Source Web Template Editor Framework
Free and Open source Web Template Editor - Next generation tool for building templates without coding
STAC Browser
mapscaping.com
Episodes are evergreen — people keep discovering and sharing them long after publication.
Map Tools - Mapscaping.com
Welcome to StateQuest Interactive, the ultimate US geography puzzle that transforms learning into an engaging map game experience. Whether you're a student
Global Elevation Data Download Tool - January 6, 2026
Global Elevation Data Download Tool is the easiest way to download global elevation data for free
Hillshade, contour lines, digital elevation model (DEM), Terrain RGB, and Terrain 3D for Cesium
Global Satellite maps. Hillshade, contour lines, and digital elevation models data, 3D map with Cesium JS. API or a data package for self-hosting.
Open Topo Data
Open DEM server.
D elevation program