autodb: Automatic Database Normalisation for Data Frames
Automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.) Originally inspired by the 'AutoNormalize' library for 'Python' by 'Alteryx' (<a href="https://github.com/alteryx/autonormalize" target="_top"https://github.com/alteryx/autonormalize/a>), with various changes and improvements. Automatic discovery of functional or approximate dependencies, normalisation based on those, and plotting of the resulting "database" via 'Graphviz', with options to exclude some attributes at discovery time, or remove discovered dependencies at normalisation time.
Learn how to implement strong authentication and SSO in Shiny apps with Descope. This guide integrates both OIDC and SAML with Posit Connect for seamless login.
In order to facilitate parsing of http requests and creating appropriate responses this package provides two classes to handle a lot of the housekeeping involved in working with http exchanges. The infrastructure builds upon the rook specification and is thus well suited to be combined with httpuv based web servers.
The rcrd class extends vctr. A rcrd is composed of 1 or more fields,
which must be vectors of the same length. Is designed specifically for
classes that can naturally be decomposed into multiple vectors of the same
length, like POSIXlt, but where the organisation should be considered
an implementation detail invisible to the user (unlike a data.frame).
Guest Blog: Reproducible Data Pipelines In R With {targets} - ESIP
Reproducibility is a huge challenge in science, especially as datasets grow larger and workflows become more complex. Enter targets — an R package that helps
A data workflow is the series of steps that turn raw data into something meaningful — think downloading, cleaning, analyzing and visualizing. You might already do this in R with a mix of scripts and notebooks. Some steps in your data workflow may also be manual and require no coding, such as data processing in Excel or uploading model output data to OneDrive.
A data pipeline, on the other hand, is an automated version of that workflow. It ensures that every step happens in order, only the necessary steps are rerun when data changes, and guarantees the results are reproducible every time. A well-structured pipeline ensures that anyone revisiting the analysis — including your future self — can rerun, verify and build on the work without extra effort or missing pieces.
Provides tools for implementing Retrieval-Augmented Generation (RAG) workflows with Large Language Models (LLMs). Includes functions for document processing, text chunking, embedding generation, storage management, and content retrieval. Supports various document types and embedding providers (Ollama, OpenAI), with DuckDB as the default storage backend. Integrates with the ellmer package to equip chat objects with retrieval capabilities. Designed to offer both sensible defaults and customization options with transparent access to intermediate outputs.
A flexible, feature-rich yet light-weight logging
framework based on R6 classes. It supports hierarchical loggers,
custom log levels, arbitrary data fields in log events, logging to
plaintext, JSON, (rotating) files, memory buffers. For extra
appenders that support logging to databases, email and push
notifications see the the package lgr.app.