Search Data Engineering

Found 88 bookmarks

Newest

Yes, Postgres can do session vars - but should you use them?

Animated by some comments / complaints about Postgres’ missing user variables story on a Reddit post about PostgreSQL pain points in the real world - I thought I’d elaborate a bit on sessions vars - which is indeed a little known Postgres functionality. Although this “alley” has existed for ages...

The obvious and more well known SQL way to keep some transient state is via temp tables! They give some nice data type guarantees, performance, editor happiness to name a few benefits. But - don’t use them for high frequency use cases! A few temp tables per second might already be too much and a disaster might be waiting to happen…Because CREATE TEMP TABLE actually writes into system catalogs behind the scenes, which might not be directly obvious… And in cases of violent mis-use - think frequent, short-lived temp tables with a lot of columns, plus unoptimized and overloaded Autovacuum together with long-running queries - can lead to extreme catalog bloat (mostly on pg_attribute) and unnecessary IO for each session start / relcache filling / query planning. And it’s also hard to recover from without some full locking - so that for critical high velocity DB’s it might be a good idea to revoke temp table privileges altogether - for app / mortal users at least (not possible for superusers).

The 2nd most obvious way to keep some DB-side session state around would probably be to use more persistent normal tables, right? Already better than temp tables as no danger of bloating the system catalog, right? NO. Pushing transient data though WAL (including replicas and backup systems) is pretty bad and pointless and only to be recommended for tiny use cases. In the Postgres world, exactly for these kinds of transient use cases, special UNLOGGED tables should be used! Which can relieve the IO pressure on the system / whole cluster considerably. One of course just needs to account for the semi-persistent nature - and the fact that they won’t be private anymore. Meaning usage of RLS in case of secret data or just using some random enough keys to avoid collisions.

PostgreSQL

·kmoppel.github.io·Feb 10, 2026

Yes, Postgres can do session vars - but should you use them?

Dbml editor

·dbml-editor.alswl.com·Sep 30, 2025

Dbml editor

How to make data pipelines idempotent

Unable to find practical examples of idempotent data pipelines? Then, this post is for you. In this post, we go over a technique that you can use to make your data pipelines professional and data reprocessing a breeze.

·startdataengineering.com·May 10, 2025

How to make data pipelines idempotent

Data Pipeline Design Patterns - #1. Data flow patterns

Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer value (allowing devs to do more in less time), decrease toil during pipeline failures, and build trust with stakeholders. This post goes over the most commonly used data flow design patterns, what they do, when to use them, and, more importantly, when not to use them. By the end of this post, you will have an overview of the typical data flow patterns and be able to choose the right one for your use case.

·startdataengineering.com·May 5, 2025

Data Pipeline Design Patterns - #1. Data flow patterns

Load data from a REST API | dlt Docs

How to extract data from a REST API using dlt's REST API source

·dlthub.com·Apr 11, 2025

Load data from a REST API | dlt Docs

A Look at PostgreSQL User-defined Data Types

This tutorial shows you how to create PostgreSQL user-defined data type using CREATE DOMAIN and CREATE TYPE statements.

Databases

·neon.tech·Mar 28, 2025

A Look at PostgreSQL User-defined Data Types

PostgreSQL Generated Columns

In this tutorial, you will learn about PostgreSQL generated columns whose values are automatically calculated from other columns.

In PostgreSQL, a generated column is a special type of column whose values are automatically calculated based on expressions or values from other columns. A generated column is referred to as a computed column in the SQL Server or a virtual column in Oracle .

There are two kinds of generated columns: Stored: A stored generated column is calculated when it is inserted or updated and occupies storage space. Virtual: A virtual generated column is computed when it is read and does not occupy storage space.

A virtual generated column is like a view, whereas a stored generated column is similar to a materialized view. Unlike a material view, PostgreSQL automatically updates data for stored generated columns.

PostgreSQL currently implements only stored generated columns.

Databases #database #postgres #sql #tutorial #documentation #docs

·neon.tech·Mar 28, 2025

PostgreSQL Generated Columns

PostgreSQL Sequences

In this tutorial, you will learn about the PostgreSQL sequences and how to use a sequence object to generate a sequence of numbers.

In PostgreSQL, a sequence is a database object that allows you to generate a sequence of unique integers. Typically, you use a sequence to generate a unique identifier for a primary key in a table. Additionally, you can use a sequence to generate unique numbers across tables. To create a new sequence, you use the CREATE SEQUENCE statement.

Listing all sequences in a database To list all sequences in the current database, you use the following query: SELECT relname sequence_name FROM pg_class WHERE relkind = 'S';

Databases #database #postgres #sql #tutorial

·neon.tech·Mar 28, 2025

PostgreSQL Sequences

PostgreSQL Identity Column

This tutorial shows you how to use the GENERATED AS IDENTITY constraint to create the PostgreSQL identity column for a table.

PostgreSQL version 10 introduced a new constraint GENERATED AS IDENTITY that allows you to automatically assign a unique number to a column.

The GENERATED AS IDENTITY constraint is the SQL standard-conforming variant of the good old SERIAL column.

The following illustrates the syntax of the GENERATED AS IDENTITY constraint: column_name type GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY[ ( sequence_option ) ]

In this syntax: The type can be SMALLINT, INT, or BIGINT. The GENERATED ALWAYS instructs PostgreSQL to always generate a value for the identity column. If you attempt to insert (or update) values into the GENERATED ALWAYS AS IDENTITY column, PostgreSQL will issue an error. The GENERATED BY DEFAULT instructs PostgreSQL to generate a value for the identity column. However, if you supply a value for insert or update, PostgreSQL will use that value to insert into the identity column instead of using the system-generated value.

PostgreSQL allows a table to have more than one identity column. Like the SERIAL, the GENERATED AS IDENTITY constraint also uses the SEQUENCE object internally.

To fix the error, you can use the OVERRIDING SYSTEM VALUE clause as follows: INSERT INTO color (color_id, color_name) OVERRIDING SYSTEM VALUE VALUES(2, 'Green');

Alternatively, you can use GENERATED BY DEFAULT AS IDENTITY instead.

Because the GENERATED AS IDENTITY constraint uses the SEQUENCE object, you can specify the sequence options for the system-generated values.

For example, you can specify the starting value and the increment as follows: DROP TABLE color; CREATE TABLE color ( color_id INT GENERATED BY DEFAULT AS IDENTITY (START WITH 10 INCREMENT BY 10), color_name VARCHAR NOT NULL );

In this example, the system-generated value for the color_id column starts with 10 and the increment value is also 10.

Databases #database #postgres #sql #dev #docs

·neon.tech·Mar 28, 2025

PostgreSQL Identity Column

6 Ways Gen AI is improving Data Modelling

GDPR and your right to be deleted

#generative-ai #ai #data-engineering #data-science #data-model

·datapro.news·Dec 31, 2024

6 Ways Gen AI is improving Data Modelling

mgramin/awesome-db-tools: Everything that makes working with databases easier

Everything that makes working with databases easier - mgramin/awesome-db-tools

Databases #database #dev #github #awesome #data #sql #nosql #data-engineering #tool

·github.com·Dec 28, 2024

mgramin/awesome-db-tools: Everything that makes working with databases easier

Declarative vs Versioned Workflows | Atlas | Manage your database schema as code

This section introduces two types of workflows that are supported by Atlas

Databases

·atlasgo.io·Dec 28, 2024

Declarative vs Versioned Workflows | Atlas | Manage your database schema as code

Schema Change Management Tools

Here's a brief history of database schema migration and how modern, opensource solutions can be used so both Devs and Ops can work less and accomplish more.

Databases #database #data-engineering #data-model #dev #article #tool #sql

·dzone.com·Dec 28, 2024

Schema Change Management Tools

AskYourDatabase - Chat with database and get insights using AI without writing SQL.

Chat with database using AI.

Databases

·askyourdatabase.com·Dec 5, 2024

AskYourDatabase - Chat with database and get insights using AI without writing SQL.

AI Database Design Flowchart Generator

Unlock efficient database design with our AI-powered Database Design Flowchart Generator! Experience fast, accurate, and intuitive creation of complex database schemas. Save time, reduce errors, and streamline your workflow — start designing smarter today!

Databases

·taskade.com·Dec 5, 2024

AI Database Design Flowchart Generator

The “Database as Code” Manifesto

Treat your database as Code

The “Database as Code” Manifesto

Databases

·database-as-code.org·Nov 12, 2024

The “Database as Code” Manifesto

Schema-driven development in 2021 - 99designs

Schema-driven development is an important concept to know in 2021. What exactly is schema-driven development? What are the benefits of schema-driven development? We will explore the answers to these questions in this article.

·99designs.com·Oct 12, 2024

Schema-driven development in 2021 - 99designs

Sequel

Converse with your database using natural language

Databases

·sequel.sh·Sep 24, 2024

Sequel

Building data-centric apps with a reactive relational database

We're exploring an approach to simplifying app development: storing all application and UI state in a client-side reactive relational database that provides a structured dataflow model.

Databases

·riffle.systems·May 6, 2024

Building data-centric apps with a reactive relational database

tSQLt - Database Unit Testing for SQL Server

Database Unit Testing for SQL Server

·tsqlt.org·Jul 28, 2023

tSQLt - Database Unit Testing for SQL Server

Understanding Data and Metadata - Role and Key Differences

Explore the intricacies of data and metadata, their key differences and the importance of metadata management tools such as dbForge Documenter.

·blog.devart.com·Jun 19, 2023

Understanding Data and Metadata - Role and Key Differences

SQL Server "Codify" Function

This function will jump-start the process of converting long descriptions into meaningful abbreviations. It's great for creating "Code" columns in lookup tables.

Databases

·nolongerset.com·May 11, 2023

SQL Server "Codify" Function

Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro

The reality is that all organisations possess data quality rules but they’re typically scattered widely across the organisation with no thought to standardisation, governance and re-use. The following resources will help your organisation buck that trend adopt data quality rules management habits a

Databases

·dataqualitypro.com·Mar 29, 2023

Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro

Data Model Design & Best Practices: Part 1

Without the Data Model and tools like Talend, data can completely fail to provide business value, or worse impede its success through inaccuracy, misuse, or misunderstanding.

#data-engineering #data-model

·talend.com·Sep 21, 2022

Data Model Design & Best Practices: Part 1

Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy

Data Modeling is a topic that never goes away. Sometimes I do reminisce about the good ol’ days of Kimball-style data models, it was so simple, straightforward, just the same thing for years. Then Big Data happened, Spark happened. Things just changed. There is a lot of new content coming out around Data Lakes and […]

#data-engineering #data-model

·confessionsofadataguy.com·Sep 21, 2022

Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy

Modernize your apps with new innovations across SQL Server 2022 and Azure SQL - Events

Get to know SQL Server 2022 and Azure SQL with continued performance and security innovation

Databases

·docs.microsoft.com·Jul 30, 2022

Modernize your apps with new innovations across SQL Server 2022 and Azure SQL - Events

Add or connect a database with WSL

Learn how to set up MySQL MongoDB, PostgreSQL, SQLite, Microsoft SQL Server, or Redis on the Windows Subsystem for Linux.

Databases

·docs.microsoft.com·Jul 12, 2022

Add or connect a database with WSL

How to Create a Handy SQL Server Backup Database Script

Learn how to backup your databases regularly, whether a full or differential backup, by creating handy SQL server database backup scripts in this tutorial!

Databases

·adamtheautomator.com·Jun 4, 2022

How to Create a Handy SQL Server Backup Database Script

The “Database as Code” Manifesto

·gramin.pro·Mar 9, 2022

The “Database as Code” Manifesto

Community Guide to PostgreSQL GUI Tools - PostgreSQL wiki

Databases

·wiki.postgresql.org·Feb 25, 2022

Community Guide to PostgreSQL GUI Tools - PostgreSQL wiki