OpenWetlandsMap
Viewer | Mapterhorn
VersaTiles
A completely FLOSS map stack.
Sourdough
The Handbook
Introduction The GitLab team handbook is the central repository for how we run the company. Printed, it consists of over 2,000 pages of text. As part of our value of being transparent the handbook is open to the world, and we welcome feedback. Please make a merge request to suggest improvements or add clarifications. Please use issues to ask questions.
For a very specific set of internal information we also maintain an Internal Handbook
GitLab Data · GitLab
This is the main group for the GitLab Data Team.
Enterprise Data Warehouse
Enterprise Data Warehouse Overview Architectural Overview The EDW is viewed as a series of layers. With five consecutive layers, where data progresses through the layers, and one development layer where data is explored and developed. Each layer has a purpose in the overall operation and effectiveness of the EDW. All data within the EDW will land in Landing. Subsequently all following layers are optional, with the remark that Tableau should connect only to prod database schemas.
Spatial Parallel Computing by Hierarchical Data Partitioning
Geospatial data computation is parallelized by grid, hierarchy, or raster files. Based on future (Bengtsson, 2024 doi:10.32614/CRAN.package.future) and mirai (Gao et al., 2025 doi:10.32614/CRAN.package.mirai) parallel back-ends, terra (Hijmans et al., 2025 doi:10.32614/CRAN.package.terra) and sf (Pebesma et al., 2024 doi:10.32614/CRAN.package.sf) functions as well as convenience functions in the package can be distributed over multiple threads. The simplest way of parallelizing generic geospatial computation is to start from par_pad_*() functions to par_grid(), par_hierarchy(), or par_multirasters() functions. Virtually any functions accepting classes in terra or sf packages can be used in the three parallelization functions. A common raster-vector overlay operation is provided as a function extract_at(), which uses exactextractr (Baston, 2023 doi:10.32614/CRAN.package.exactextractr), with options for kernel weights for summarizing raster values at vector geometries. Other convenience functions for vector-vector operations including simple areal interpolation (summarize_aw()) and summation of exponentially decaying weights (summarize_sedc()) are also provided.
How to Choose Microservice’s Boundaries?
Organisations have been struggling to get their microservices boundaries right. Here's what they should do: instead of size, think about the flow.
How to Determine The Land Value of a Property: Comprehensive Valuation Methods
Learn how to determine the land value of a property using tax records, apps, comparables, and professional methods with CRI Properties' expert guidance.
Complete Guide to Vacant Land Appraisals for Investment — Austin & Austin Appraisal Services
The Complete Guide to Appraising Vacant Land for Investment Purposes
altdoc
Package Development – Data Science
Integration approaches and methods | ArcGIS Architecture Center
ArcGIS Well-Architected.
Map Unit Key Grids and Thematic Maps of Soil Survey Geographic (SSURGO) Data
Life Altering Postgresql Patterns
Use UUID primary keys
UUIDs have downsides
Truly random UUIDs doesn't sort well (and this has implications for indexes)
They take up more space than sequential ids (space being your cheapest resource)
But I've found those to be far outweighed by the upsides
You don't need to coordinate with the database to produce one.
They are safe to share externally.
Give everything created_at and updated_at
It's not a full history, but knowing when a record was created or last changed is a useful breadcrumb when debugging. Its also something you can't retroactively get unless you were recording it.
So just always slap a created_at and updated_at on your tables. You can maintain updated_at automatically with a trigger.
You need to create the trigger for each table, but you only need to create the function once.
on update restrict on delete restrict
When you make a foreign key constraint on a table, always mark it with on update restrict on delete restrict.
This makes it so that if you try and delete the referenced row you will get an error. Storage is cheap, recovering data is a nightmare. Better to error than do something like cascade.
Use schemas
By default, every table in Postgres will go into the "public" schema. This is fine, but you are missing out if you don't take advantage of your ability to make new schemas.
Schemas work as namespaces for tables and for any moderate to large app you are going to have a lot of tables. You can do joins and have relationships between tables in different schemas so there isn't much of a downside.
Enum Tables
There are a lot of ways to make "enums" in sql. One is to use the actual "enum types," another is to use a check constraint.
The pattern introduced to me by Hasura was enum tables.
Have a table with some text value as a primary key and make columns in other tables reference it with a foreign key.
This way you can insert into a table to add more allowed values or attach metadata like a comment to explain what each value means.
Mechanically name join tables
Sometimes there are sensible names to give "join tables" - tables which form the basis for "many to many" relationships between data - but often there isn't. In those cases don't hesitate to just concatenate the names of the tables you are joining between.
Represent statuses as a log
It is very tempting to represent the status of something as a single column. You submit some paperwork and it has a status of submitted. Someone starts to look at it then it transitions to in_review. From there maybe its rejected or approved.
There are two problems with this
You might actually care about when it was approved, or by whom.
You might receive this information out-of-order.
Webhooks are a prime example of the 2nd situation. There's no way in the laws of physics to be sure you'll get events in exactly the right order.
To handle this you should have a table where each row represents the status of the thing at a given point in time. Instead of overloading created_at or updated_at for this, have an explicit valid_at which says when that information is valid for.
Just having an index on valid_at can work for a while, but eventually your queries will get too slow. There are a lot of ways to handle this, but the one we've found that works the best is to have an explicit latest column with a cheeky unique index and trigger to make sure that only the row with the newest valid_at is the latest one.
Mark special rows with a system_id
It's not uncommon to end up with "special rows." By this I mean rows in a table that the rest of your system will rely on the presence of to build up behavior.
All rows in an enum table are like this, but you will also end up with rows in tables of otherwise normal "generated during the course of normal use" rows. For these, give them a special system_id.
Use views sparingly
Views are amazing and terrible.
They are amazing in their ability to wrap up a relatively complex or error-prone query into something that looks basically like a table.
They are terrible in that removing obsolete columns requires a drop and recreation, which can become a nightmare when you build views on views. The query planner also seems to have trouble seeing through them in general.
So do use views, but only as many as you need and be very wary of building views on views
JSON Queries
You might have heard that Postgres "supports JSON." This is true, but I had mostly heard it in the context of storing and querying JSON. If you want a table with some blob of info slap a jsonb column on one your tables.
That is neat, but I've gotten way more mileage out of using JSON as the result of a query. This has definite downsides like losing type information, needing to realize your results all at once, and the overhead of writing into json.
But the giant upside is that you can get all the information you want from the database in one trip, no cartesian product nightmares or N+1 problems in sight.
Import rasters file to PostGIS database using raster2pgsql - Spatial Dev Guru
aster2pgsql is GIS command line utility which is installed as a part of PostGIS. To install raster2pgsql, you have to install PostGIS and raster2pgsql will be installed with it.
Using 'gdal' CLI algorithms from R
R Bindings to GDAL
API bindings to the Geospatial Data Abstraction Library (GDAL, ). Implements the GDAL Raster and Vector Data Models. Bindings are implemented with Rcpp modules. Exposed C++ classes and stand-alone functions wrap much of the GDAL API and provide additional functionality. Calling signatures resemble the native C, C++ and Python APIs provided by the GDAL project. Class GDALRaster encapsulates a GDALDataset and its raster band objects. Class GDALVector encapsulates an OGRLayer and the GDALDataset that contains it. Initial bindings are provided to the unified gdal command line interface added in GDAL 3.11. C++ stand-alone functions provide bindings to most GDAL "traditional" raster and vector utilities, including OGR facilities for vector geoprocessing, several algorithms, as well as the Geometry API (GEOS via GDAL headers), the Spatial Reference Systems API, and methods for coordinate transformation. Bindings to the Virtual Systems Interface (VSI) API implement standard file system operations abstracted for URLs, cloud storage services, Zip/GZip/7z/RAR, in-memory files, as well as regular local file systems. This provides a single interface for operating on file system objects that works the same for any storage backend. A custom raster calculator evaluates a user-defined R expression on a layer or stack of layers, with pixel x/y available as variables in the expression. Raster combine() identifies and counts unique pixel combinations across multiple input layers, with optional raster output of the pixel-level combination IDs. Basic plotting capability is provided for raster and vector display. gdalraster leans toward minimalism and the use of simple, lightweight objects for holding raw data. Currently, only minimal S3 class interfaces have been implemented for selected R objects that contain spatial data. gdalraster may be useful in applications that need scalable, low-level I/O, or prefer a direct GDAL API.
TC's GIS and Geography Blog
Viewing the world through geography and GIS.
VRT -- GDAL Virtual Format — GDAL documentation
The VRT driver is a format driver for GDAL that allows a virtual GDAL dataset to be composed from other GDAL datasets with repositioning, and algorithms potentially applied as well as various kinds of metadata altered or added. VRT descriptions of datasets can be saved in an XML format normally given the extension .vrt.
gdalbuildvrt — GDAL documentation
gdalbuildvrt [--help] [--long-usage] [--help-general] [--quiet] [[-strict]|[-non_strict]] [-tile_index <field_name>] [-resolution user|average|common|highest|lowest|same] [-tr <xres> <yes>] [-input_file_list <filename>] [[-separate]|[-pixel-function <function>]] [-pixel-function-arg <NAME>=<VALUE>]... [-allow_projection_difference] [-sd <n>] [-tap] [-te <xmin> <ymin> <xmax> <ymax>] [-addalpha] [-b <band>]... [-hidenodata] [-overwrite] [-srcnodata "<value>[ <value>]..."] [-vrtnodata "<value>[ <value>]..."] [-a_srs <srs_def>] [-r nearest|bilinear|cubic|cubicspline|lanczos|average|mode] [-oo <NAME>=<VALUE>]... [-co <NAME>=<VALUE>]... [-ignore_srcmaskband] [-nodata_max_mask_threshold <threshold>] <vrt_dataset_name> [<src_dataset_name>]...
This program builds a VRT (Virtual Dataset) that is a mosaic of a list of input GDAL datasets. The list of input GDAL datasets can be specified at the end of the command line, put in a text file (one filename per line) for very long lists, or it can be a MapServer tileindex (see the gdaltindex utility). If using a tile index, all entries in the tile index will be added to the VRT.
Before I Sleep: Push the limits of interactive mapping in R with vector tiles
A tutorial on mapping with vector tiles in R
JSON – Data Science
Performance Optimization for Plumber APIs: Serialization – Joe Kirincic
A series of posts about making your Plumber APIs production ready.
Tidyverse style guide
Tidy design principles
OpenResty® - Open source
YeSQL specifications — YeSQL documentation
Cursor agent best practices
A comprehensive guide to working with coding agents, from starting with plans to managing context, customizing workflows, and reviewing code.