02-AREAS

02-AREAS

2277 bookmarks
Newest
Data Pipeline Design Patterns - #1. Data flow patterns
Data Pipeline Design Patterns - #1. Data flow patterns
Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer value (allowing devs to do more in less time), decrease toil during pipeline failures, and build trust with stakeholders. This post goes over the most commonly used data flow design patterns, what they do, when to use them, and, more importantly, when not to use them. By the end of this post, you will have an overview of the typical data flow patterns and be able to choose the right one for your use case.
·startdataengineering.com·
Data Pipeline Design Patterns - #1. Data flow patterns
Advanced Tidyverse
Advanced Tidyverse
Use piped workflows for efficient data cleaning and visualization.
·sesync-ci.github.io·
Advanced Tidyverse
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model
Editor's Note: This post was written by Chris Pappalardo, a Senior Director at Alvarez & Marsal, a leading global professional services firm. The standard processes for building with LLM work well for documents that contain mostly text, but do not work as well for documents that contain tabular data (like spreadsheets). We wrote about our latest thinking on Q&A over csvs on the blog a couple weeks ago, and we loved reading Chris's exploration of working with csvs and LangChain using agents, chai
·blog.langchain.dev·
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model
Ploomber AI Editor
Ploomber AI Editor
Create custom Streamlit and Shiny R apps effortlessly with AI assistance. Design, code, and deploy data apps in minutes.
·editor.ploomber.io·
Ploomber AI Editor
Add Authentication and SSO to Your Shiny App
Add Authentication and SSO to Your Shiny App
Learn how to implement strong authentication and SSO in Shiny apps with Descope. This guide integrates both OIDC and SAML with Posit Connect for seamless login.
·descope.com·
Add Authentication and SSO to Your Shiny App
Powerful Classes for HTTP Requests and Responses
Powerful Classes for HTTP Requests and Responses
In order to facilitate parsing of http requests and creating appropriate responses this package provides two classes to handle a lot of the housekeeping involved in working with http exchanges. The infrastructure builds upon the rook specification and is thus well suited to be combined with httpuv based web servers.
·reqres.data-imaginist.com·
Powerful Classes for HTTP Requests and Responses
rcrd (record) S3 class — new_rcrd
rcrd (record) S3 class — new_rcrd
The rcrd class extends vctr. A rcrd is composed of 1 or more fields, which must be vectors of the same length. Is designed specifically for classes that can naturally be decomposed into multiple vectors of the same length, like POSIXlt, but where the organisation should be considered an implementation detail invisible to the user (unlike a data.frame).
·vctrs.r-lib.org·
rcrd (record) S3 class — new_rcrd
Guest Blog: Reproducible Data Pipelines In R With {targets} - ESIP
Guest Blog: Reproducible Data Pipelines In R With {targets} - ESIP
Reproducibility is a huge challenge in science, especially as datasets grow larger and workflows become more complex. Enter targets — an R package that helps
A data workflow is the series of steps that turn raw data into something meaningful — think downloading, cleaning, analyzing and visualizing. You might already do this in R with a mix of scripts and notebooks. Some steps in your data workflow may also be manual and require no coding, such as data processing in Excel or uploading model output data to OneDrive. A data pipeline, on the other hand, is an automated version of that workflow. It ensures that every step happens in order, only the necessary steps are rerun when data changes, and guarantees the results are reproducible every time. A well-structured pipeline ensures that anyone revisiting the analysis — including your future self — can rerun, verify and build on the work without extra effort or missing pieces.
·esipfed.org·
Guest Blog: Reproducible Data Pipelines In R With {targets} - ESIP
PostgreSQL Foreign Key
PostgreSQL Foreign Key
In this tutorial, you will learn about PostgreSQL foreign key and how to add foreign keys to tables using foreign key constraints.
The following illustrates a foreign key constraint syntax: [CONSTRAINT fk_name] FOREIGN KEY(fk_columns) REFERENCES parent_table(parent_key_columns) [ON DELETE delete_action] [ON UPDATE update_action]
In this syntax: First, specify the name for the foreign key constraint after the CONSTRAINT keyword. The CONSTRAINT clause is optional. If you omit it, PostgreSQL will assign an auto-generated name. Second, specify one or more foreign key columns in parentheses after the FOREIGN KEY keywords. Third, specify the parent table and parent key columns referenced by the foreign key columns in the REFERENCES clause. Finally, specify the desired delete and update actions in the ON DELETE and ON UPDATE clauses.
Since the primary key is rarely updated, the ON UPDATE action is infrequently used in practice. We’ll focus on the ON DELETE action.
PostgreSQL supports the following actions: SET NULL SET DEFAULT RESTRICT NO ACTION CASCADE
·neon.tech·
PostgreSQL Foreign Key
PostgreSQL Copy Table: A Step-by-Step Guide
PostgreSQL Copy Table: A Step-by-Step Guide
In this tutorial, you will learn how to copy an existing table to a new one using various PostgreSQL copy table statements.
To copy a table completely, including both table structure and data, you use the following statement: CREATE TABLE new_table AS TABLE existing_table;
·neon.tech·
PostgreSQL Copy Table: A Step-by-Step Guide
PostgreSQL Temporary Table
PostgreSQL Temporary Table
You will learn about the PostgreSQL temporary table and how to manage it using the CREATE TEMP TABLE and DROP TABLE statements.
When to use temporary tables Isolation of data: Since the temporary tables are session-specific, different sessions or transactions can use the same table name for temporary tables without causing a conflict. This allows you to isolate data for a specific task or session. Intermediate storage: Temporary tables can be useful for storing the intermediate results of a complex query. For example, you can break down a complex query into multiple simple ones and use temporary tables as the intermediate storage for storing the partial results. Transaction scope: Temporary tables can be also useful if you want to store intermediate results within a transaction. In this case, the temporary tables will be visible only to that transaction
·neon.tech·
PostgreSQL Temporary Table