02-AREAS

02-AREAS

175 bookmarks
Newest
REST API in R with plumber
REST API in R with plumber
API and R Nowadays, it’s pretty much expected that software comes with an HTTP API interface. Every programming language out there offers a way to expose APIs or make GET/POST/PUT requests, including R. In this post, I’ll show you how to create an API using the plumber package. Plus, I’ll give you tips on how to make it more production ready - I’ll tackle scalability, statelessness, caching, and load balancing. You’ll even see how to consume your API with other tools like python, curl, and the R own httr package.
Nowadays, it’s pretty much expected that software comes with an HTTP API interface. Every programming language out there offers a way to expose APIs or make GET/POST/PUT requests, including R. In this post, I’ll show you how to create an API using the plumber package. Plus, I’ll give you tips on how to make it more production ready - I’ll tackle scalability, statelessness, caching, and load balancing. You’ll even see how to consume your API with other tools like python, curl, and the R own httr package
# When an API is started it might take some time to initialize # this function stops the main execution and wait until # plumber API is ready to take queries. wait_for_api <- function(log_path, timeout = 60, check_every = 1) { times <- timeout / check_every for(i in seq_len(times)) { Sys.sleep(check_every) if(any(grepl(readLines(log_path), pattern = "Running plumber API"))) { return(invisible()) } } stop("Waiting timed!") }
Oh, in some examples I am using redis. So, before you dive in, make sure to fire up a simple redis server. At the end of the script, I’ll be turning redis off, so you don’t want to be using it for anything else at the same time. I just want to remind you that this code isn’t meant to be run on a production server.
redis is launched in a background, , so you might want to wait a little bit to make sure it’s fully up and running before moving on.
wait_for_redis <- function(timeout = 60, check_every = 1) { times <- timeout / check_every for(i in seq_len(times)) { Sys.sleep(check_every) status <- suppressWarnings(system2("redis-cli", "PING", stdout = TRUE, stderr = TRUE) == "PONG") if(status) { return(invisible()) } } stop("Redis waiting timed!") }
First off, let’s talk about logging. I try to log as much as possible, especially in critical areas like database accesses, and interactions with other systems. This way, if there’s an issue in the future (and trust me, there will be), I should be able to diagnose the problem just by looking at the logs alone. Logging is like “print debugging” (putting print(“I am here”), print(“I am here 2”) everywhere), but done ahead of time. I always try to think about what information might be needed to make a correct diagnosis, so logging variable values is a must. The logger and glue packages are your best friends in that area.
Next, it might also be useful to add a unique request identifier ((I am doing that in setuuid filter)) to be able to track it across the whole pipeline (since a single request might be passed across many functions). You might also want to add some other identifiers, such as MACHINE_ID - your API might be deployed on many machines, so it could be helpful for diagnosing if the problem is associated with a specific instance or if it’s a global issue.
In general you shouldn’t worry too much about the size of the logs. Even if you generate ~10KB per request, it will take 100000 requests to generate 1GB. And for the plumber API, 100000 requests generated in a short time is A LOT. In such scenario you should look into other languages. And if you have that many requests, you probably have a budget for storing those logs:)
It might also be a good idea to setup some automatic system to monitor those logs (e.g. Amazon CloudWatch if you are on AWS). In my example I would definitely monitor Error when reading key from cache string. That would give me an indication of any ongoing problems with API cache.
Speaking of cache, you might use it to save a lot of resources. Caching is a very broad topic with many pitfalls (what to cache, stale cache, etc) so I won’t spend too much time on it, but you might want to read at least a little bit about it. In my example, I am using redis key-value store, which allows me to save the result for a given request, and if there is another requests that asks for the same data, I can read it from redis much faster.
Note that you could use memoise package to achieve similar thing using R only. However, redis might be useful when you are using multiple workers. Then, one cached request becomes available for all other R processes. But if you need to deploy just one process, memoise is fine, and it does not introduce another dependency - which is always a plus.
info <- function(req, ...) { do.call( log_info, c( list("MachineId: {MACHINE_ID}, ReqId: {req$request_id}"), list(...), .sep = ", " ), envir = parent.frame(1) ) }
#* Log some information about the incoming request #* https://www.rplumber.io/articles/routing-and-input.html - this is a must read! #* @filter setuuid function(req) { req$request_id <- UUIDgenerate(n = 1) plumber::forward() }
#* Log some information about the incoming request #* @filter logger function(req) { if(!grepl(req$PATH_INFO, pattern = "PATH_INFO")) { info( req, "REQUEST_METHOD: {req$REQUEST_METHOD}", "PATH_INFO: {req$PATH_INFO}", "HTTP_USER_AGENT: {req$HTTP_USER_AGENT}", "REMOTE_ADDR: {req$REMOTE_ADDR}" ) } plumber::forward() }
To run the API in background, one additional file is needed. Here I am creating it using a simple bash script.
library(plumber) library(optparse) library(uuid) library(logger) MACHINE_ID <- "MAIN_1" PORT_NUMBER <- 8761 log_level(logger::TRACE) pr("tmp/api_v1.R") %>% pr_run(port = PORT_NUMBER)
·zstat.pl·
REST API in R with plumber
SPA Mode | Remix
SPA Mode | Remix
From the beginning, Remix's opinion has always been that you own your server architecture. This is why Remix is built on top of the Web Fetch API and can run on any modern runtime via built-in or community-provided adapters. While we believe that having a server provides the best UX/Performance/SEO/etc. for most apps, it is also undeniable that there exist plenty of valid use cases for a Single Page Application in the real world:
SPA Mode is basically what you'd get if you had your own React Router + Vite setup using createBrowserRouter/RouterProvider, but along with some extra Remix goodies: File-based routing (or config-based via routes()) Automatic route-based code-splitting via route.lazy <Link prefetch> support to eagerly prefetch route modules <head> management via Remix <Meta>/<Links> APIs SPA Mode tells Remix that you do not plan on running a Remix server at runtime and that you wish to generate a static index.html file at build time and you will only use Client Data APIs for data loading and mutations. The index.html is generated from the HydrateFallback component in your root.tsx route. The initial "render" to generate the index.html will not include any routes deeper than root. This ensures that the index.html file can be served/hydrated for paths beyond / (i.e., /about) if you configure your CDN/server to do so.
·remix.run·
SPA Mode | Remix
Smarter Single Page Application with a REST API
Smarter Single Page Application with a REST API
How can you build a Single Page Application with a REST API that doesn't have a ton of business logic in the client? Use Hypermedia!
When the Browser is the client consuming HTML, it understands how to render HTML. HTML has a specification. The browser understands how to handle a <form> tag or a <button>. It was driven by the HTML at runtime.
How can you build a smarter Single Page Application with a REST API? The concepts have been since the beginning of the web, yet have somehow lost their way in modern REST API that drives a Single Page Application or Mobile Applications. Here’s how to guide clients based on state by moving more information from design time to runtime.
State If you’re developing more than a CRUD application, you’re likely going to be driven by the state of the system. Apps that have Task Based UIs (hint: go read my post on Decomposing CRUD to Task Based UIs) are guiding users down a path of actions they can perform based on the state of the system. The example throughout this post is the concept of a Product in a warehouse. If we have a tasks that let’s someone mark a Product as no longer being available for sale or it being available for sale, these tasks can be driven by the state of the Product. If the given UI task is “Mark as Available” then the Product must be currently unavailable and we have a quantity on hand that’s greater than zero
History of Clients Taking a step back a bit, web apps were developed initially with just plain HTML (over 20 years ago for me). In its most basic form, a static HTML page contained a <form> that the browser rendered for the user to fill out and submit. The form’s action would point to a URI usually to a script, often written in Perl, in the cgi-bin folder on the webserver. The script would take the form data (sent via POST from the browser) and insert it into a database, send an email, or whatever the required behavior was. As web apps progressed, instead of the HTML being in a static file, it was dynamically created by the server. But it was still just plain HTML. The browser was the client. HTML was the content it’s consuming.
Modern Clients As web apps progressed with AJAX (XMLHttpRequest) instead of using HTML forms, Javascript was used to send the HTTP request. The browsers turned more into the Host of the application which was written in Javascript. Now, Javascript is the client. JSON is the content it’s consuming.
Runtime vs Design Time When the Browser is the client consuming HTML, it understands how to render HTML. HTML has a specification. The browser understands how to handle a <form> tag or a <button>. It was driven by the HTML at runtime.
In modern SPAs consuming JSON, the data itself is unstructured. Each client has to be created uniquely based on the content it receives. This has to be developed at design time when creating the javascript client.
When developing a SPA, you may leverage something like OpenAPI to generate code to use in the SPA/clients to make the HTTP calls to the server. But you must understand as a developer, at design time (when developing) when to make a call to the server.
To use my earlier example of making a product available for sale, if you were developing a server-side rendered HTML web app, you wouldn’t return the form apart of the HTML if the product couldn’t be made available. You would do this because on the server you have the state of the product (fetched from the database). If you’re creating a SPA, you’re likely putting that same logic in your client so you can conditionally show UI elements. It wouldn’t be a great experience for the user to be able to perform an action, then see an error message because the server/api threw a 400 because the product is not in a state to allow it to be available.
Hypermedia Hypermedia is what is used in HTML to tell the Browser what it can do. As I mentioned earlier, a <form> is a hypermedia control.
HTTP APIs The vast majority of modern HTTP APIs serving JSON, do not provide any information in the content (JSON) about what actions or other resources the consuming client (SPA) can take. Meaning, we provide no information at runtime. All of that has to be figured out at design time.
You will still need to know (via OpenAPI) at design time, all the information about the routes you will be calling, and their results, however, you can now have the server return JSON that can guide the client based on state.
·codeopinion.com·
Smarter Single Page Application with a REST API
Free Website Speed Test
Free Website Speed Test
Measure the speed and Core Web Vitals of your website. Find out how to make your website load faster and rank well in Google.
·debugbear.com·
Free Website Speed Test
Rectangling
Rectangling
Rectangling is the art and craft of taking a deeply nested list (often sourced from wild caught JSON or XML) and taming it into a tidy data set of rows and columns. This vignette introduces you to the main rectangling tools provided by tidyr: `unnest_longer()`, `unnest_wider()`, and `hoist()`.
·tidyr.tidyverse.org·
Rectangling
How to Wrangle JSON Data in R with jsonlite, purr and dplyr - Robot Wealth
How to Wrangle JSON Data in R with jsonlite, purr and dplyr - Robot Wealth
Working with modern APIs you will often have to wrangle with data in JSON format. This article presents some tools and recipes for working with JSON data with R in the tidyverse. We’ll use purrr::map functions to extract and transform our JSON data. And we’ll provide intuitive examples of the cross-overs and differences between purrr ... Read more
·robotwealth.com·
How to Wrangle JSON Data in R with jsonlite, purr and dplyr - Robot Wealth
R - JSON Files
R - JSON Files
R - JSON Files - JSON file stores data as text in human-readable format. Json stands for JavaScript Object Notation. R can read JSON files using the rjson package.
·tutorialspoint.com·
R - JSON Files