Provides an interface to the Mapbox GL JS () and the MapLibre GL JS () interactive mapping libraries to help users create custom interactive maps in R. Users can create interactive globe visualizations; layer sf objects to create filled maps, circle maps, heatmaps, and three-dimensional graphics; and customize map styles and views. The package also includes utilities to use Mapbox and MapLibre maps in Shiny web applications.
Every map can use its own set of icons for displaying points of interest, highway shields, peaks, etc. In special cases, maps can also include patterns, helping users distinguish between similar polygon features. A set of icons and patterns and a file defining which icon should be used for what purpose is called a sprite.
Geospatial Data Pipeline Management: Modern approaches vs traditional methods - Matt Forrest
Having the right formats is one thing. Getting your data into those formats reliably, at scale, and on schedule is another thing entirely. You can read all you want about GeoParquet and Zarr and COG, but if you can't create a repeatable process to convert your legacy shapefiles to GeoParquet or your NetCDF files to
Leveraging Azure Batch and Geospatial Open Source Standards to Map the World | Azure
By: Zoe Statman-Weil & Mark Mathis, Impact Observatory, Inc.
Global decision makers need timely, accurate maps
Land use and land cover (LULC) maps are used by decision makers in governments, civil society, industries, and finance to observe how the world is changing, and to understand and manage the impact of their actions. Historically, LULC maps are produced using expensive, semi-automated techniques requiring significant human input and thus leading to significant delays between collection of satellite images and production of maps, limiting the ability to get regular and frequent temporal updates to users. Making the detailed, accurate maps the whole world needs to understand our rapidly changing planet with timely updates requires automation. A groundbreaking artificial intelligence-powered 2020 global LULC map was produced for Esri on Microsoft Azure by Impact Observatory, a mission-driven technology company bringing AI algorithms and on-demand data to environmental monitoring and sustainability risk analysis. This map will be used to help decision makers address challenges in climate change mitigation and adaptation, biodiversity preservation, and sustainable development.
The Impact Observatory LULC machine learning (ML) model was trained on an Azure NC12s v2 virtual machine (VM) powered by NVIDIA® Tesla® P100 GPUs using over 5 billion pixels hand-labeled into one of ten classes: trees, water, built area, scrub/shrub, flooded vegetation, bare ground, cropland, grassland, snow/ice, and clouds. The model was then deployed over more than 450,000 Copernicus Sentinel-2 Level-2A 10-meter resolution, surface reflectance corrected images, each 100 km x 100km in size and totaling 500 terabytes of satellite imagery (1 terabyte = 1012 bytes) hosted on the Microsoft Planetary Computer. The processing leveraged geospatial open standards, Azure Batch, and other Azure resources to efficiently produce the final dataset at scale and at a low cost.
Geospatial Open Standards support distributed processing
The Microsoft Planetary Computer and Impact Observatory (IO) make extensive use of geospatial open standards, specifically Cloud Optimized GeoTIFF (COG) and Spatial Temporal Asset Catalog (STAC). Use of these standards enabled the team to produce the Esri 2020 Land Cover map using distributed processing at scale.
GeoTIFF is a widely used open standard for geospatial data based on the common TIFF image file format, able to support imagery with bands beyond the usual red, green, blue visible light bands, and containing additional metadata to locate the image on the surface of the Earth. A COG is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. Not only can COGs be read from the cloud without needing to duplicate the data to a local filesystem, but a portion of the file can be read using HTTP GET Range requests allowing for targeted reading and efficient processing. Azure Blob Storage is an ideal solution for hosting COGs as it is an unstructured data storage system accessible via HTTP requests. The LULC map was produced using Sentinel-2 COGs hosted on Microsoft’s Planetary Computer in Blob Storage, and all prediction rasters produced from the model were saved as COGs to Blob Storage.
The STAC specification is a common language used to index geospatial data for easy search and discovery. IO searched the Planetary Computer’s STAC catalog to identify Sentinel-2 imagery for certain locations, times, and cloud coverage. IO applied a community supported implementation of the STAC interface to create its own STAC catalog on Azure App Services with Azure Database for PostgreSQL as the underlying data store. IO’s STAC catalog was used to index data throughout the model deployment pipeline and thus served as both a tool for checkpointing pipeline progress, as well as indexing the final product.
COGs and STAC, both easily leveraged in Azure, provide a scalable and highly flexible framework for processing geospatial data.
Azure Batch enabled Impact Observatory to map the globe at record scale & speed
Azure Batch was used by IO to efficiently deploy the model over satellite images in parallel at a large scale. IO bundled the ML model, and deployment and processing code into Docker containers, and ran Batch tasks within these containers on a Batch pool of compute nodes.
The data processing pipeline consisted of three primary tasks: 1) Deploying the model over one 100 km x 100 km Sentinel-2 COG by chipping it into hundreds of overlapping 5 km X 5 km smaller images, running those chips through the model, and finally merging the chips back together; 2) Computing a class weighted mode across all model predictions for a given Sentinel-2 image footprint; and 3) Combining the class weighted modes produced in #2 for a given Military Grid Reference System (MGRS) zone into one COG. IO relied heavily on Batch’s task dependency capabilities, which allowed, for example, for the class-weighted mode task (#2) to only be scheduled for execution when the relevant set of model deployment tasks (#1) were completed successfully.
While the model was trained on a GPU-enabled VM, the model deployment over the image chips was executed on CPU-based virtual machines, enabling resource efficient computation at scale. Due to the task dependent nature of the pipeline, all tasks needed to be run on the same pool, and thus the same VM type. RAM and network bandwidth requirements fluctuated for the tasks, but the high CPU usage ended up being the defining factor in VM choice. In the end, the data was processed on low-priority Standard Azure D4 V2 virtual machine powered by Intel® Xeon® scalable processors with seven task slots allocated per node.
It took over one million core hours to process the data for the entire LULC map. With the scaling flexibility of Batch, IO was able to process over 10% of the earth’s surface a day. The completed Esri 2020 Land Cover map is now freely available on Esri Living Atlas and the Microsoft Planetary Computer.
For additional information visit https://www.impactobservatory.com/
Storing and querying your geospatial data in Azure
While Azure Maps is known for great use cases around visualizing and interacting with a map and location data, you probably also need secure and reliable storage for that data that offers the flexibility to query your (location) data. In this blog post, we explore the different options for storing and querying geospatial data in Azure, including Azure Cosmos DB, Azure SQL Database, and Azure Blob Storage. Storing and querying geospatial data in Azure is a powerful and flexible way to manage and analyze large sets of geographic information.
Azure Cosmos DB is a globally distributed, multi-model database that supports document, key-value, graph, and column-family data models. One of the key features of Cosmos DB is its support for geospatial data, which allows you to store and query data in the form of points, lines, and polygons. Cosmos DB also supports spatial indexing and advanced querying capabilities, making it a great choice for applications that require real-time, low-latency access to geospatial data.
Example query:
SELECT f.id
FROM Families f
WHERE ST_DISTANCE(f.location, {"type": "Point", "coordinates":[31.9, -4.8]}) 30000
Read here more information about Geospatial and GeoJSON location data in Azure Cosmos DB.
Another option for storing and querying geospatial data in Azure is Azure SQL Database. SQL Database is a fully managed, relational database service that supports the spatial data types and functions of SQL Server. This allows you to store and query geospatial data using standard SQL syntax, and also includes spatial indexing and querying capabilities. SQL Database is a good choice for applications that require a traditional relational database model and support for SQL-based querying.
Read here more information about Spatial Data in Azure SQL Database.
Finally, Azure Blob Storage can be used to store and query large amounts of unstructured data, including geospatial data. Blob Storage allows you to store data in the form of blobs, which can be accessed via a URL. This makes it a great option for storing large files, such as satellite imagery or shapefiles. While Blob Storage does not include built-in support for spatial querying, it can be used in conjunction with other Azure services, such as Azure Data Lake Storage or Azure Databricks, to perform spatial analysis on the data.
In this sample we used satellite imagery that is stored in Azure Blob storage
https://samples.azuremaps.com/?sample=tile-layer-options
Lastly, to see a sample that pulls Azure Maps and Azure Databases together, see the Microsoft Learn topic Geospatial data processing and analytics - Azure Example Scenarios which discusses:
Azure Database for PostgreSQL - a fully managed relational database service that's based on the community edition of the open-source PostgreSQL database engine.
PostGIS - an extension for the PostgreSQL database that integrates with GIS servers. PostGIS can run SQL location queries that involve geographic objects.
In conclusion, Azure offers a variety of options for storing and querying geospatial data, including Azure Cosmos DB, Azure SQL Database, and Azure Blob Storage. Each of these services has its own set of features and capabilities, and choosing the right one will depend on the specific needs of your application. Whether you need low-latency access to real-time data, support for traditional SQL-based querying, or the ability to store and analyze large amounts of unstructured data, Azure has the tools you need to get the job done.
Provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.
Rs sf package ships with self-contained GDAL
executables, including a bare bones interface to several
GDAL-related utility programs collectively known as the GDAL
utilities. For each of those utilities, this package provides an
R wrapper whose formal arguments closely mirror those of the
GDAL command line interface. The utilities operate on data
stored in files and typically write their output to other
files. Therefore, to process data stored in any of Rs more common
spatial formats (i.e. those supported by the sf and terra
packages), first write them to disk, then process them with the
package's wrapper functions before reading the outputted results
back into R. GDAL function arguments introduced in GDAL version
3.5.2 or earlier are supported.
/vsicurl/ (http/https/ftp files: random access)
/vsicurl/ is a file system handler that allows on-the-fly random reading of files available through HTTP/FTP web protocols, without prior download of the entire file. It requires GDAL to be built against libcurl.
Recognized filenames are of the form /vsicurl/http[s]://path/to/remote/resource or /vsicurl/ftp://path/to/remote/resource, where path/to/remote/resource is the URL of a remote resource.
Example using ogrinfo to read a shapefile on the internet:
ogrinfo -ro -al -so /vsicurl/https://raw.githubusercontent.com/OSGeo/gdal/master/autotest/ogr/data/poly.shp
Options can be passed in the filename with the following syntax: /vsicurl?[option_i=val_i&]*url=http://... where each option name and value (including the value of "url") is URL-encoded.
Currently supported options are:
use_head=yes/no: whether the HTTP HEAD request can be emitted. Default to YES. Setting this option overrides the behavior of the CPL_VSIL_CURL_USE_HEAD configuration option.
max_retry=number: default to 0. Setting this option overrides the behavior of the GDAL_HTTP_MAX_RETRY configuration option.
retry_delay=number_in_seconds: default to 30. Setting this option overrides the behavior of the GDAL_HTTP_RETRY_DELAY configuration option.
retry_codes=``ALL`` or comma-separated list of HTTP error codes. Setting this option overrides the behavior of the GDAL_HTTP_RETRY_CODES configuration option. (GDAL >= 3.10)
list_dir=yes/no: whether an attempt to read the file list of the directory where the file is located should be done. Default to YES.
empty_dir=yes/no: whether to disable directory listing and disable logic in drivers to probe for individual side-car files. Default to NO.
useragent=value: HTTP UserAgent header
referer=value: HTTP Referer header
cookie=value: HTTP Cookie header
header_file=value: Filename that contains one or several "Header: Value" lines
header.<key>=<value>: HTTP request header of name <key> and value <value>. (GDAL >= 3.11). e.g. header.Accept=application%2Fjson
unsafessl=yes/no
low_speed_time=value
low_speed_limit=value
proxy=value
proxyauth=value
proxyuserpwd=value
pc_url_signing=yes/no: whether to use the URL signing mechanism of Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/docs/concepts/sas/). (GDAL >= 3.5.2). Note that starting with GDAL 3.9, this may also be set with the path-specific option ( cf VSISetPathSpecificOption()) VSICURL_PC_URL_SIGNING set to YES.
pc_collection=name: name of the collection of the dataset for Planetary Computer URL signing. Only used when pc_url_signing=yes. (GDAL >= 3.5.2)
Partial downloads (requires the HTTP server to support random reading) are done with a 16 KB granularity by default. The chunk size can be configured with the CPL_VSIL_CURL_CHUNK_SIZE configuration option, with a value in bytes. If the driver detects sequential reading, it will progressively increase the chunk size up to 128 times CPL_VSIL_CURL_CHUNK_SIZE (so 2 MB by default) to improve download performance.
In addition, a global least-recently-used cache of 16 MB shared among all downloaded content is used, and content in it may be reused after a file handle has been closed and reopen, during the life-time of the process or until VSICurlClearCache() is called. The size of this global LRU cache can be modified by setting the configuration option CPL_VSIL_CURL_CACHE_SIZE (in bytes).
When increasing the value of CPL_VSIL_CURL_CHUNK_SIZE to optimize sequential reading, it is recommended to increase CPL_VSIL_CURL_CACHE_SIZE as well to 128 times the value of CPL_VSIL_CURL_CHUNK_SIZE.
The GDAL_INGESTED_BYTES_AT_OPEN configuration option can be set to impose the number of bytes read in one GET call at file opening (can help performance to read Cloud optimized geotiff with a large header).
The GDAL_HTTP_PROXY (for both HTTP and HTTPS protocols), GDAL_HTTPS_PROXY (for HTTPS protocol only), GDAL_HTTP_PROXYUSERPWD and GDAL_PROXY_AUTH configuration options can be used to define a proxy server. The syntax to use is the one of Curl CURLOPT_PROXY, CURLOPT_PROXYUSERPWD and CURLOPT_PROXYAUTH options.
The CURL_CA_BUNDLE or SSL_CERT_FILE configuration options can be used to set the path to the Certification Authority (CA) bundle file (if not specified, curl will use a file in a system location).
Additional HTTP headers can be sent by setting the GDAL_HTTP_HEADER_FILE configuration option to point to a filename of a text file with "key: value" HTTP headers.
As an alternative, starting with GDAL 3.6, the GDAL_HTTP_HEADERS configuration option can also be used to specify headers. CPL_CURL_VERBOSE=YES allows one to see them and more, when combined with --debug.
Starting with GDAL 3.10, the Authorization header is no longer automatically forwarded when redirections are followed. That behavior can be configured by setting the CPL_VSIL_CURL_AUTHORIZATION_HEADER_ALLOWED_IF_REDIRECT configuration option.
Starting with GDAL 3.11, a query string can be appended to a given /vsicurl/ filename by taking its value from the VSICURL_QUERY_STRING path-specific option set with VSISetPathSpecificOption(). This can for example be used when managing Shared Access Signatures (SAS) on application side, and not wanting to include the signature as part of the filename propagated through GDAL.
The GDAL_HTTP_MAX_RETRY (number of attempts) and GDAL_HTTP_RETRY_DELAY (in seconds) configuration option can be set, so that request retries are done in case of HTTP errors 429, 502, 503 or 504.
Starting with GDAL 3.6, the following configuration options control the TCP keep-alive functionality (cf https://daniel.haxx.se/blog/2020/02/10/curl-ootw-keepalive-time/ for a detailed explanation):
GDAL_HTTP_TCP_KEEPALIVE = YES/NO. whether to enable TCP keep-alive. Defaults to NO
GDAL_HTTP_TCP_KEEPIDLE = integer, in seconds. Keep-alive idle time. Defaults to 60. Only taken into account if GDAL_HTTP_TCP_KEEPALIVE=YES.
GDAL_HTTP_TCP_KEEPINTVL = integer, in seconds. Interval time between keep-alive probes. Defaults to 60. Only taken into account if GDAL_HTTP_TCP_KEEPALIVE=YES.
Starting with GDAL 3.7, the following configuration options control support for SSL client certificates:
GDAL_HTTP_SSLCERT = filename. Filename of the the SSL client certificate. Cf https://curl.se/libcurl/c/CURLOPT_SSLCERT.html
GDAL_HTTP_SSLCERTTYPE = string. Format of the SSL certificate: "PEM" or "DER". Cf https://curl.se/libcurl/c/CURLOPT_SSLCERTTYPE.html
GDAL_HTTP_SSLKEY = filename. Private key file for TLS and SSL client certificate. Cf https://curl.se/libcurl/c/CURLOPT_SSLKEY.html
GDAL_HTTP_KEYPASSWD = string. Passphrase to private key. Cf https://curl.se/libcurl/c/CURLOPT_KEYPASSWD.html
More generally options of CPLHTTPFetch() available through configuration options are available. Starting with GDAL 3.7, the above configuration options can also be specified as path-specific options with VSISetPathSpecificOption().
Starting with GDAL 3.11, the following configuration options control the number of HTTP connections:
GDAL_HTTP_MAX_CACHED_CONNECTIONS = integer_number. Maximum amount of connections that libcurl may keep alive in its connection cache after use. Cf https://curl.se/libcurl/c/CURLMOPT_MAXCONNECTS.html
GDAL_HTTP_MAX_TOTAL_CONNECTIONS = integer_number. Maximum number of simultaneously open connections in total. Cf https://curl.se/libcurl/c/CURLMOPT_MAX_TOTAL_CONNECTIONS.html
The file can be cached in RAM by setting the configuration option VSI_CACHE to TRUE. The cache size defaults to 25 MB, but can be modified by setting the configuration option VSI_CACHE_SIZE (in bytes). Content in that cache is discarded when the file handle is closed.
The CPL_VSIL_CURL_NON_CACHED configuration option can be set to values like /vsicurl/http://example.com/foo.tif:/vsicurl/http://example.com/some_directory, so that at file handle closing, all cached content related to the mentioned file(s) is no longer cached. This can help when dealing with resources that can be modified during execution of GDAL related code. Alternatively, VSICurlClearCache() can be used.
/vsicurl/ will try to query directly redirected URLs to Amazon S3 signed URLs during their validity period, so as to minimize round-trips. This behavior can be disabled by setting the configuration option CPL_VSIL_CURL_USE_S3_REDIRECT to NO.
Starting with GDAL 3.12, the GDAL_HTTP_PATH_VERBATIM configuration option can be set to YES so that sequences of /../ or /./ that may exist in the URL's path part are kept unchanged. Otherwise, by default, they are squashed, according to RFC 3986 section 5.2.4.
VSIStatL() will return the size in st_size member and file nature- file or directory - in st_mode member (the later only reliable with FTP resources for now).
VSIReadDir() should be able to parse the HTML directory listing returned by the most popular web servers, such as Apache and Microsoft IIS.
/vsicurl_streaming/ (http/https/ftp files: streaming)
/vsicurl_streaming/ is a file system handler that allows on-the-fly sequential reading of files streamed through HTTP/FTP web protocols, without prior download of the entire file. It requires GDAL to be built against libcurl.
Although this file handler is able seek to random offsets in the file, this will not be efficient. If you need efficient random access and that the server supports range downloading, you should use the /vsicurl/ file system handler instead.
Recognized filenames are of the form /vsicurl_streaming/http[s]://path/to/remote/resource or /vsicurl_streaming/ftp://path/to/remote/resource, where path/to/remote/resource is the URL of a remote resource.
The GDAL_HTTP_PROXY (for both HTTP and HTTPS protocols), GDAL_HTTPS_PROXY (for HTTPS protocol only), GDAL_HTTP_PROXYUSERPWD and GDAL_PROXY_AUTH configuration options can be used to define a proxy server. The syntax to use is the one of Curl CURLOPT_PROXY, CURLOPT_PROXYUSERPWD and CURLOPT_PROXYAUTH options.
The CURL_CA_BUNDLE or SSL_CERT_FILE configuration options can be used to set the path to the Certification Authority (CA) bundle file (if not specified, curl will use a file in a system location).
The file can be cached in RAM by setting the configuration option VSI_CACHE to TRUE. The cache size defaults to 25 MB, but can be modified by setting the configuration option VSI_CACHE_SIZE (in bytes).
VSIStatL() will return the size in st_size member and file nature- file or directory - in st_mode member (the later only reliable with FTP resources for now).
Mass Effect Wiki is a comprehensive database for the Mass Effect video game series. The wiki is dedicated to collecting all information related to the franchise, such as classes, characters, races, walkthroughs, assignments and more!
Including function calls in error messages — topic-error-call
Starting with rlang 1.0, abort() includes the erroring function in the message by default:
my_function <- function() {
abort("Can't do that.")
}
my_function()
#> Error in `my_function()`:
#> ! Can't do that.
This works well when abort() is called directly within the failing function. However, when the abort() call is exported to another function (which we call an "error helper"), we need to be explicit about which function abort() is throwing an error for.
This works well when abort() is called directly within the failing function. However, when the abort() call is exported to another function (which we call an "error helper"), we need to be explicit about which function abort() is throwing an error for.
There are two main kinds of error helpers:
Simple abort() wrappers. These often aim at adding classes and attributes to an error condition in a structured way:
stop_my_class <- function(message) {
abort(message, class = "my_class")
}
Input checking functions. An input checker is typically passed an input and an argument name. It throws an error if the input doesn't conform to expectations:
check_string <- function(x, arg = "x") {
if (!is_string(x)) {
cli::cli_abort("{.arg {arg}} must be a string.")
}
}
To fix this, let abort() know about the function that it is throwing the error for by passing the corresponding function environment as the call argument:
Sysxplore explores DevOps, Cloud, and Linux topics in a straightforward way, making complex concepts easy to grasp. Our goal is to deliver technical information and make it enjoyable to learn.