libr
3MW (Automate Anything With R & GitHub Actions)
GitHub Actions is an incredible tool to automate any calculation you want. But it can feel dauni
Networking best practices | Cloud Run Documentation | Google Cloud
Getting Started with TypeSpec For REST APIs
TypeSpec is a language and toolset developed by Microsoft for defining data models and service APIs. It provides a structured way to describe the shape and behavior of data and services, ensuring consistency and reducing errors in API development. With TypeSpec, you can generate code, documentation, and other artifacts from your API definitions, making it easier to maintain and evolve your services. Microsoft uses TypeSpec internally to define APIs for various products and services, including Azure
TypeSpec is used to define the interface of your API, which clients will use to interact with resources provided by your service. This includes specifying the operations, request and response models, and error handling mechanisms. The actual API logic is implemented in the backend service, which processes the requests and communicates with the database.
How to Reverse Engineer APIs: The Benefits and Tools
Leverage API reverse engineering to boost interoperability and identify security flaws in a program. Learn about reverse engineering APIs.
Best Practices of Reverse Engineering an API - Apriorit
Learn everything about reverse engineering an API, from benefits for your software to real-life scenarios from our experts.
Refactoring notes
I worked on a refactor of an R package at work the other day. Here’s some notes about that after doing the work. This IS NOT a best practices post - it’s just a collection of thoughts.
For context, the package is an API client.
It made sense to break the work for any given exported function into the following components, as applicable depending on the endpoint being handled (some endpoints needed just a few lines of code, so those funtions were left unchanged):
Reverse Engineering REST APIs: The Easy Way (2/12)
mirai - Plumber Integration
Crafting Intelligent User Experiences: A Deep Dive into OpenAI Assistants API
Elevate, Enhance, and Empower your apps with Assistants APIs and Tools
What’s an OpenAI Assistant?
Think of it as a software glue that affords you to gel together agent-like capabilities in your applications to conduct tasks expressed as instructions in natural language to an Assistant.
Able to understand instructions, it can leverage OpenAI’s SOTA models and tools to carry out tasks. With Assistants stateful API, you can create Assistants within your application, providing you access to three types of supported tools: Code Interpreter, Retrieval, and Function calling [5].
At the core it has few concepts and components that cogently interact together, to enable agent-like capabilities.
Assistants API, concepts, components, and tools
Unfortunately, OpenAI documentation falls short in explaining or illustrating these components into finer details and showing how they work together. Randy Michak of Empowerment AI does a fine job of dissecting these core components and illustrating their flow and data interactions [7]. Inspired by Michak, I mildly modified Figure 4, showing dynamic interaction and data flowing among Assistants API components.
To get started with Assistants, the OpenAI guide stipulates four simple steps to use the Assistants API to glue together these core components for coordination [8].
Step 1: Create an Assistant, to declare a custom model and provide instructions for the Assistant. This helps the Assistant to elect the appropriate supported tool to employ.
Step 2: Create a Thread, a stateful session for the Assistant to retrieve messages from and add Assistant messages to.
Step 3: Use the Thread as a conversational session to add messages for the assistants to consume.
Step 4: Run the Assistant on a newly added Thread message to trigger responses. This run is Assistant’s asynchronous runtime environment.
How does it all work together?
Let’s methodically walk through a simple example where we want to accomplish the following:
Integrate Assistants API, using Retrieval tool, to a) upload a couple of pdf documents and b) use an Assistant to query the contents of the document. Consider this as a mini Retrieval Augmented Generation (RAG) application.
Use Files objects to upload the pdf files so that the Assistant can access them.
Create and employ the Assistant, Threads, Messages, and Run objects to query the uploaded pdf documents.
Coordinate all these concrete objects to interact and interplay together as part of my application.
Step 1: Create File objects as our knowledge base
Upload your PDFs in the retrievers’ database, using a File object. The Assistants API breaks them into parts, as chunks, and saves them, as indexes and vector embeddings. When you ask a question, Retrievers find the best matches and help the Assistant give you a detailed answer, just like a big RAG retriever.
Step 2: Create an Assistant object.
To use an Assistant and conduct tasks, first, create an AI Assistant object. Supply the Assistant with a model, instructional behavior, tools to use, and file IDs to employ for its knowledge base, as parameters.
REST API in R with plumber
API and R Nowadays, it’s pretty much expected that software comes with an HTTP API interface. Every programming language out there offers a way to expose APIs or make GET/POST/PUT requests, including R. In this post, I’ll show you how to create an API using the plumber package. Plus, I’ll give you tips on how to make it more production ready - I’ll tackle scalability, statelessness, caching, and load balancing. You’ll even see how to consume your API with other tools like python, curl, and the R own httr package.
Nowadays, it’s pretty much expected that software comes with an HTTP API interface. Every programming language out there offers a way to expose APIs or make GET/POST/PUT requests, including R. In this post, I’ll show you how to create an API using the plumber package. Plus, I’ll give you tips on how to make it more production ready - I’ll tackle scalability, statelessness, caching, and load balancing. You’ll even see how to consume your API with other tools like python, curl, and the R own httr package
# When an API is started it might take some time to initialize # this function stops the main execution and wait until # plumber API is ready to take queries. wait_for_api <- function(log_path, timeout = 60, check_every = 1) { times <- timeout / check_every for(i in seq_len(times)) { Sys.sleep(check_every) if(any(grepl(readLines(log_path), pattern = "Running plumber API"))) { return(invisible()) } } stop("Waiting timed!") }
Oh, in some examples I am using redis. So, before you dive in, make sure to fire up a simple redis server. At the end of the script, I’ll be turning redis off, so you don’t want to be using it for anything else at the same time. I just want to remind you that this code isn’t meant to be run on a production server.
redis is launched in a background, , so you might want to wait a little bit to make sure it’s fully up and running before moving on.
wait_for_redis <- function(timeout = 60, check_every = 1) { times <- timeout / check_every for(i in seq_len(times)) { Sys.sleep(check_every) status <- suppressWarnings(system2("redis-cli", "PING", stdout = TRUE, stderr = TRUE) == "PONG") if(status) { return(invisible()) } } stop("Redis waiting timed!") }
First off, let’s talk about logging. I try to log as much as possible, especially in critical areas like database accesses, and interactions with other systems. This way, if there’s an issue in the future (and trust me, there will be), I should be able to diagnose the problem just by looking at the logs alone. Logging is like “print debugging” (putting print(“I am here”), print(“I am here 2”) everywhere), but done ahead of time. I always try to think about what information might be needed to make a correct diagnosis, so logging variable values is a must. The logger and glue packages are your best friends in that area.
Next, it might also be useful to add a unique request identifier ((I am doing that in setuuid filter)) to be able to track it across the whole pipeline (since a single request might be passed across many functions). You might also want to add some other identifiers, such as MACHINE_ID - your API might be deployed on many machines, so it could be helpful for diagnosing if the problem is associated with a specific instance or if it’s a global issue.
In general you shouldn’t worry too much about the size of the logs. Even if you generate ~10KB per request, it will take 100000 requests to generate 1GB. And for the plumber API, 100000 requests generated in a short time is A LOT. In such scenario you should look into other languages. And if you have that many requests, you probably have a budget for storing those logs:)
It might also be a good idea to setup some automatic system to monitor those logs (e.g. Amazon CloudWatch if you are on AWS). In my example I would definitely monitor Error when reading key from cache string. That would give me an indication of any ongoing problems with API cache.
Speaking of cache, you might use it to save a lot of resources. Caching is a very broad topic with many pitfalls (what to cache, stale cache, etc) so I won’t spend too much time on it, but you might want to read at least a little bit about it. In my example, I am using redis key-value store, which allows me to save the result for a given request, and if there is another requests that asks for the same data, I can read it from redis much faster.
Note that you could use memoise package to achieve similar thing using R only. However, redis might be useful when you are using multiple workers. Then, one cached request becomes available for all other R processes. But if you need to deploy just one process, memoise is fine, and it does not introduce another dependency - which is always a plus.
info <- function(req, ...) { do.call( log_info, c( list("MachineId: {MACHINE_ID}, ReqId: {req$request_id}"), list(...), .sep = ", " ), envir = parent.frame(1) ) }
#* Log some information about the incoming request #* https://www.rplumber.io/articles/routing-and-input.html - this is a must read! #* @filter setuuid function(req) { req$request_id <- UUIDgenerate(n = 1) plumber::forward() }
#* Log some information about the incoming request #* @filter logger function(req) { if(!grepl(req$PATH_INFO, pattern = "PATH_INFO")) { info( req, "REQUEST_METHOD: {req$REQUEST_METHOD}", "PATH_INFO: {req$PATH_INFO}", "HTTP_USER_AGENT: {req$HTTP_USER_AGENT}", "REMOTE_ADDR: {req$REMOTE_ADDR}" ) } plumber::forward() }
To run the API in background, one additional file is needed. Here I am creating it using a simple bash script.
library(plumber) library(optparse) library(uuid) library(logger) MACHINE_ID <- "MAIN_1" PORT_NUMBER <- 8761 log_level(logger::TRACE) pr("tmp/api_v1.R") %>% pr_run(port = PORT_NUMBER)
notes on a {plumber} todo backend
tl;dr I implemented a todo backend in R with plumber. Use Rocker with PPAs for fast container builds. Dirk Eddelbuettel demonstrates how at this link. todo backend Todo-Backend is “a shared example to showcase backend tech stacks”, inspired by the front-end todomvc. It’s a good way to get a sense for how you might implement similar functionality in other languages. I’d read some other posts on setting up plumber apis so I decided to give it a shot.
Overview - express-rate-limit
Rate Limiting with NGINX
What is OAuth 2.0 and How does it Work?
Discover what OAuth 2.0 is, how it works, and its importance in securing and managing access to online resources. Learn about its key grants and benefits.
OAuth 2.0 — OAuth
What is an Access Token - OAuth 2.0
Api oopsies 101#the r way best practices for error handling
JSON Schema validation | Godspeed Docs
The Framework provides request and response schema validation
Request schema validation
We have the ability to define inputs and their types in our request schema, such as path parameters, query parameters, and request body. This allows the framework to validate whether the API has received the specified inputs in the expected types.
Whenever an API is triggered, AJV (Another JSON Schema Validator) verifies the request schema against the provided inputs. If the defined schema matches the inputs, it allows the workflow to execute. Otherwise, it throws an error with a status code of 400 and a descriptive message indicating where the schema validation failed.
Response schema validation
Just like request schema validation, there's also response schema validation in place. In this process, the framework checks the response type, validates the properties of the response, and ensures they align with the specified types.
The process of response schema validation involves storing the response schema, enabling the workflow to execute, and checking the response body along with its properties for validation.
Response schema validation includes two cases
Failure in Workflow Execution
Successful Workflow Execution but Fails in Response Schema Validation
If the response schema validation fails api return with 500 internal server error
In the case of failed request schema validation, the APIs respond with a status of 400 and a message indicating a "bad request." Conversely, if the response schema validation encounters an issue, the APIs return a status of 500 along with an "Internal Server Error" message.
Event with response and request schema validation
http.post./helloworld:
fn: helloworld
params:
- name: path_params
in: path
required: true
schema:
type: string
- name: query_params
in: query
required: true
schema:
type: string
body:
content:
application/json:
schema:
type: object
required: [name]
properties:
name:
type: string
responses:
200:
content:
application/json:
schema:
type: object
required: [name]
properties:
name:
type: string
Design Principles | Godspeed Docs
Three fundamental abstractions
Schema driven data validation
We follow Swagger spec as a standard to validate the schema of the event, whether incoming or outgoing events (HTTP), without developer having to write any code. In case of database API calls, the datastore plugin validates the arguments based on the DB model specified in Prisma format. The plugins for HTTP APIs or datastores offer validation for third-party API requests and responses, datastore queries, and incoming events based on Swagger spec or DB schema. For more intricate validation scenarios, such as conditional validation based on attributes like subject, object, environment, or payload, developers can incorporate these rules into the application logic as part of middleware or workflows.
Unified datastore model and API
The unified model configuration and CRUD API, which includes popular SQL, NoSQL stores including Elasticgraph (a unique ORM over Elasticsearch), offer standardized interfaces to various types of datastores, whether SQL or NoSQL. Each integration adapts to the nature of the data store. The Prisma and Elasticgraph plugins provided by Godspeed expose the native functions of the client used, giving developer the freedom to use the universal syntax or native queries.
Authentication
Authentication helps to identify who is the user, and generate their access tokens or JWT token for authorized access to the resources of the application. The framework gives developers full freedom to setup any kind of authentication. For ex. they can setup simple auth using the microservice's internal datastore. Or they can invoke an IAM service like ORY Kratos, AWS Okta, or an inhouse service. They can also add OAUTH2 authentication using different providers like Google, Microsoft, Apple, Github etc. using pre-built plugins, or import and customize an existing HTTP plugin like Express, by adding PassportJS middleware.
Authorization
Authorization is key to security, for multi-tenant or variety of other use cases. The framework allows neat, clean and low code syntax to have a fine grained authorization in place, at the event level or workflow's task level, when querying a database or another API. Developers define authorization rules for each event or workflow task using straightforward configurations for JWT validation or RBAC/ABAC. For more complex use cases, for ex. where they query a policy engine and dynamically compute the permissions, they can write workflows or native functions to access the datasources, compute the rules on the fly, and patch the outcome of that function into a task's authz parameter. These rules encompass not only access to API endpoints but also provide fine-grained data access within datastores, for table, row and column level access. The framework allows seamless integration with third-party authorization services or ACL databases via the datasource abstraction.
Autogenerated Swagger spec
Following the principles of Schema Driven Development, the event spec of the microservice can be used to auto-generate the Swagger spec for HTTP APIs exposed by this serice. The framework provides autogenerated Swagger documentation using CLI.
Autogenerated CRUD API
The framework provides autogenereated CRUD APIs from database model written in Prisma format. Generated API's can be extended by the developers as per their needs. We are planning auto generate or Graphql and gRpc APIs, and may release a developer bountry for the same soon.
Environment variables and configurations
The framework promites setting up of environment variables in a pre-defined YAML file. Though the developer can also allow access by other means via a .env file or setting them up manually. Further configurations are to be written in /config folder. These variables are accessible in
Other configuration files
Datasource, event source and event definitions
Workflows
Log redaction
The framework allows developer to specify the keys that may have sensitive information and should never get published in logs by mistake. There is a centralized check for such keys before a log is about to be printed.
Telemetry autoinstrumentation using OTEL
Godspeed allows a developer to add auto-instrumentation which publishes logs, trace and APM information in OTEL standard format, supported by all major observability backends. The APM export captures not just the RAM, CPU information per node/pod/service, but also the latency information of the incoming API calls, with broken down spans giving breakup of latency across the calls to datastores or external APIs. This helps to find out exact bottlenecks. Further the logs and trace/spans are correlated to find out exactly where the error happened in a request spanning multiple microservices with each calling multiple datasources and doing internal computation. Developer can also add custom logging, span creation and BPM metrics at task level. For ex. new user registration, failed login attempt etc.
An API Definition As The Truth In The API Contract by The API Evangelist
Making sense of the technology, business, and politics of APIs that is impact all stages of our physical and digital worlds.
API Reconnaissance (Passive Recon)
When we are gathering information about an API we use two different methods:
inurl:”/api/v1" site:microsoft.com
Reverse Engineer an API using MITMWEB and POSTMAN and create a Swagger file (crAPI)
Many times when the we are trying to Pentest an API we might not get access to Swagger file or the documentations of the API, Today we will…
Many times when the we are trying to Pentest an API we might not get access to Swagger file or the documentations of the API, Today we will try to create the swagger file using Mitmweb and Postman.
Man in The Midlle Proxy (MITMweb)
run mitmweb through our command line in Kali
and as we can see it starts to listen on the port 8080 for http/https traffic, and we will make sure that its running by navigating to the above address which is the localhost at port 8081
and then we will proxy our traffic thorugh Burp Suite proxy port 8080 because we already has mitmweb listening for this port (make sure Burp is closed)
and then we will stop the capture and use mitmproxy2swagger to analyse it
Comlink | Superface
Overview
Prompt Caching in the API | OpenAI
Offering automatic discounts on inputs that the model has recently seen
How I would do auth
A quick blog on how I would implement auth for my applications.
First, if the application is for devs and I need something very quick, I would just use GitHub OAuth. Done in 10 minutes.
Now to the main part - how would I implement password-based auth? The minimum for me would be password with 2FA using authenticator apps. Passkeys aren’t widespread enough and I just find magic-links annoying.
Always implement rate-limiting, even if it’s something very basic!
Session management
Database sessions 100%. I really, really don’t like JWTs and they shouldn’t be used as sessions majority of times.
Assuming I only have to deal with authenticated sessions, my preferred approach is 30 days expiration but the expiration gets extended every time the session is used. This ensures active users stay authenticated while inactive users are signed out.
Registration
Hot take - I think it’s fine for apps to share whether an email exists in their system or not. If the email is already taken, just tell the user that they already have an account. Significantly better UX for minimal security loss. Don’t use emails for auth if you don’t like that.
Anyway, something more important than preventing user enumeration is checking passwords against previous leaks. The haveibeenpwned.com API is probably the best option for this. This will reduce the effectiveness of credential stuffing attacks, where an attacker targets accounts using leaked passwords from other websites.
Passwords are hashed with either Argon2id or Scrypt - they’re both good enough. Bcrypt is ok but it unfortunately has a 50-70 character limit.
Rate limiting will be set to around 1 attempt per second per IP address. Captchas if I start to get spams.
Email verification
I would also check if the email starts or ends with a space just to make sure the user didn’t mistype it.
I personally prefer OTPs for email verification over links, but both work fine. For OTPs, a basic throttling like 5-10 attempts per hour per account should be good enough. The code will be valid for 10, maybe 15 minutes. For verification links, I’d set the expiration to 2 hours.
Here’s some ways I would generate those OTPs:
bytes := make([]byte, 5)
rand.Read(bytes)
// 8 characters, 40 bits of entropy
// I might use a custom character set to remove 1, I, 0, and O.
otp := base32.StdEncoding.EncodeToString(bytes)
// 8 characters, entropy equivalent to ~26 bits
// This introduces a tiny bias.
// See RFC 4226 for why this is fine.
bytes := make([]byte, 4)
rand.Read(bytes)
num := int(binary.BigEndian.Uint32(bytes) % 100000000)
otp := fmt.Sprintf("%08d", num)
First of all, I wouldn’t bother with those 100 character long regex. Here’s the only email regex you’ll ever need:
^.+@.+\..+$
Shiny in production with the prodverse - prodverse
Smarter Single Page Application with a REST API
How can you build a Single Page Application with a REST API that doesn't have a ton of business logic in the client? Use Hypermedia!
When the Browser is the client consuming HTML, it understands how to render HTML. HTML has a specification. The browser understands how to handle a <form> tag or a <button>. It was driven by the HTML at runtime.
How can you build a smarter Single Page Application with a REST API? The concepts have been since the beginning of the web, yet have somehow lost their way in modern REST API that drives a Single Page Application or Mobile Applications. Here’s how to guide clients based on state by moving more information from design time to runtime.
State
If you’re developing more than a CRUD application, you’re likely going to be driven by the state of the system. Apps that have Task Based UIs (hint: go read my post on Decomposing CRUD to Task Based UIs) are guiding users down a path of actions they can perform based on the state of the system.
The example throughout this post is the concept of a Product in a warehouse. If we have a tasks that let’s someone mark a Product as no longer being available for sale or it being available for sale, these tasks can be driven by the state of the Product.
If the given UI task is “Mark as Available” then the Product must be currently unavailable and we have a quantity on hand that’s greater than zero
History of Clients
Taking a step back a bit, web apps were developed initially with just plain HTML (over 20 years ago for me). In its most basic form, a static HTML page contained a <form> that the browser rendered for the user to fill out and submit. The form’s action would point to a URI usually to a script, often written in Perl, in the cgi-bin folder on the webserver. The script would take the form data (sent via POST from the browser) and insert it into a database, send an email, or whatever the required behavior was.
As web apps progressed, instead of the HTML being in a static file, it was dynamically created by the server. But it was still just plain HTML.
The browser was the client. HTML was the content it’s consuming.
Modern Clients
As web apps progressed with AJAX (XMLHttpRequest) instead of using HTML forms, Javascript was used to send the HTTP request. The browsers turned more into the Host of the application which was written in Javascript.
Now, Javascript is the client. JSON is the content it’s consuming.
Runtime vs Design Time
When the Browser is the client consuming HTML, it understands how to render HTML. HTML has a specification. The browser understands how to handle a <form> tag or a <button>. It was driven by the HTML at runtime.
In modern SPAs consuming JSON, the data itself is unstructured. Each client has to be created uniquely based on the content it receives. This has to be developed at design time when creating the javascript client.
When developing a SPA, you may leverage something like OpenAPI to generate code to use in the SPA/clients to make the HTTP calls to the server. But you must understand as a developer, at design time (when developing) when to make a call to the server.
To use my earlier example of making a product available for sale, if you were developing a server-side rendered HTML web app, you wouldn’t return the form apart of the HTML if the product couldn’t be made available. You would do this because on the server you have the state of the product (fetched from the database). If you’re creating a SPA, you’re likely putting that same logic in your client so you can conditionally show UI elements. It wouldn’t be a great experience for the user to be able to perform an action, then see an error message because the server/api threw a 400 because the product is not in a state to allow it to be available.
Hypermedia
Hypermedia is what is used in HTML to tell the Browser what it can do. As I mentioned earlier, a <form> is a hypermedia control.
HTTP APIs
The vast majority of modern HTTP APIs serving JSON, do not provide any information in the content (JSON) about what actions or other resources the consuming client (SPA) can take. Meaning, we provide no information at runtime. All of that has to be figured out at design time.
You will still need to know (via OpenAPI) at design time, all the information about the routes you will be calling, and their results, however, you can now have the server return JSON that can guide the client based on state.
JSON files & tidy data | The Byrd Lab
My lab investigates how blood pressure can be treated more effectively. Much of that work involves the painstaking development of new concepts and research methods to move forward the state of the art. For example, our work on urinary extracellular vesicles’ mRNA as an ex vivo assay of the ligand-activated transcription factor activity of mineralocorticoid receptors is challenging, fun, and rewarding. With a lot of work from Andrea Berrido and Pradeep Gunasekaran in my lab, we have been moving the ball forward on several key projects on that front.