Data Engineering

Data Engineering

88 bookmarks
Newest
Yes, Postgres can do session vars - but should you use them?
Yes, Postgres can do session vars - but should you use them?
Animated by some comments / complaints about Postgres’ missing user variables story on a Reddit post about PostgreSQL pain points in the real world - I thought I’d elaborate a bit on sessions vars - which is indeed a little known Postgres functionality. Although this “alley” has existed for ages...
The obvious and more well known SQL way to keep some transient state is via temp tables! They give some nice data type guarantees, performance, editor happiness to name a few benefits. But - don’t use them for high frequency use cases! A few temp tables per second might already be too much and a disaster might be waiting to happen…Because CREATE TEMP TABLE actually writes into system catalogs behind the scenes, which might not be directly obvious… And in cases of violent mis-use - think frequent, short-lived temp tables with a lot of columns, plus unoptimized and overloaded Autovacuum together with long-running queries - can lead to extreme catalog bloat (mostly on pg_attribute) and unnecessary IO for each session start / relcache filling / query planning. And it’s also hard to recover from without some full locking - so that for critical high velocity DB’s it might be a good idea to revoke temp table privileges altogether - for app / mortal users at least (not possible for superusers).
The 2nd most obvious way to keep some DB-side session state around would probably be to use more persistent normal tables, right? Already better than temp tables as no danger of bloating the system catalog, right? NO. Pushing transient data though WAL (including replicas and backup systems) is pretty bad and pointless and only to be recommended for tiny use cases. In the Postgres world, exactly for these kinds of transient use cases, special UNLOGGED tables should be used! Which can relieve the IO pressure on the system / whole cluster considerably. One of course just needs to account for the semi-persistent nature - and the fact that they won’t be private anymore. Meaning usage of RLS in case of secret data or just using some random enough keys to avoid collisions.
·kmoppel.github.io·
Yes, Postgres can do session vars - but should you use them?
Dbml editor
Dbml editor
·dbml-editor.alswl.com·
Dbml editor
How to make data pipelines idempotent
How to make data pipelines idempotent
Unable to find practical examples of idempotent data pipelines? Then, this post is for you. In this post, we go over a technique that you can use to make your data pipelines professional and data reprocessing a breeze.
·startdataengineering.com·
How to make data pipelines idempotent
Data Pipeline Design Patterns - #1. Data flow patterns
Data Pipeline Design Patterns - #1. Data flow patterns
Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer value (allowing devs to do more in less time), decrease toil during pipeline failures, and build trust with stakeholders. This post goes over the most commonly used data flow design patterns, what they do, when to use them, and, more importantly, when not to use them. By the end of this post, you will have an overview of the typical data flow patterns and be able to choose the right one for your use case.
·startdataengineering.com·
Data Pipeline Design Patterns - #1. Data flow patterns
PostgreSQL Generated Columns
PostgreSQL Generated Columns
In this tutorial, you will learn about PostgreSQL generated columns whose values are automatically calculated from other columns.
In PostgreSQL, a generated column is a special type of column whose values are automatically calculated based on expressions or values from other columns. A generated column is referred to as a computed column in the SQL Server or a virtual column in Oracle .
There are two kinds of generated columns: Stored: A stored generated column is calculated when it is inserted or updated and occupies storage space. Virtual: A virtual generated column is computed when it is read and does not occupy storage space.
A virtual generated column is like a view, whereas a stored generated column is similar to a materialized view. Unlike a material view, PostgreSQL automatically updates data for stored generated columns.
PostgreSQL currently implements only stored generated columns.
·neon.tech·
PostgreSQL Generated Columns
PostgreSQL Sequences
PostgreSQL Sequences
In this tutorial, you will learn about the PostgreSQL sequences and how to use a sequence object to generate a sequence of numbers.
In PostgreSQL, a sequence is a database object that allows you to generate a sequence of unique integers. Typically, you use a sequence to generate a unique identifier for a primary key in a table. Additionally, you can use a sequence to generate unique numbers across tables. To create a new sequence, you use the CREATE SEQUENCE statement.
Listing all sequences in a database To list all sequences in the current database, you use the following query: SELECT relname sequence_name FROM pg_class WHERE relkind = 'S';
·neon.tech·
PostgreSQL Sequences
PostgreSQL Identity Column
PostgreSQL Identity Column
This tutorial shows you how to use the GENERATED AS IDENTITY constraint to create the PostgreSQL identity column for a table.
PostgreSQL version 10 introduced a new constraint GENERATED AS IDENTITY that allows you to automatically assign a unique number to a column.
The GENERATED AS IDENTITY constraint is the SQL standard-conforming variant of the good old SERIAL column.
The following illustrates the syntax of the GENERATED AS IDENTITY constraint: column_name type GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY[ ( sequence_option ) ]
In this syntax: The type can be SMALLINT, INT, or BIGINT. The GENERATED ALWAYS instructs PostgreSQL to always generate a value for the identity column. If you attempt to insert (or update) values into the GENERATED ALWAYS AS IDENTITY column, PostgreSQL will issue an error. The GENERATED BY DEFAULT instructs PostgreSQL to generate a value for the identity column. However, if you supply a value for insert or update, PostgreSQL will use that value to insert into the identity column instead of using the system-generated value.
PostgreSQL allows a table to have more than one identity column. Like the SERIAL, the GENERATED AS IDENTITY constraint also uses the SEQUENCE object internally.
To fix the error, you can use the OVERRIDING SYSTEM VALUE clause as follows: INSERT INTO color (color_id, color_name) OVERRIDING SYSTEM VALUE VALUES(2, 'Green');
Alternatively, you can use GENERATED BY DEFAULT AS IDENTITY instead.
Because the GENERATED AS IDENTITY constraint uses the SEQUENCE object, you can specify the sequence options for the system-generated values.
For example, you can specify the starting value and the increment as follows: DROP TABLE color; CREATE TABLE color ( color_id INT GENERATED BY DEFAULT AS IDENTITY (START WITH 10 INCREMENT BY 10), color_name VARCHAR NOT NULL );
In this example, the system-generated value for the color_id column starts with 10 and the increment value is also 10.
·neon.tech·
PostgreSQL Identity Column
AI Database Design Flowchart Generator
AI Database Design Flowchart Generator
Unlock efficient database design with our AI-powered Database Design Flowchart Generator! Experience fast, accurate, and intuitive creation of complex database schemas. Save time, reduce errors, and streamline your workflow — start designing smarter today!
·taskade.com·
AI Database Design Flowchart Generator
Schema-driven development in 2021 - 99designs
Schema-driven development in 2021 - 99designs
Schema-driven development is an important concept to know in 2021. What exactly is schema-driven development? What are the benefits of schema-driven development? We will explore the answers to these questions in this article.
·99designs.com·
Schema-driven development in 2021 - 99designs
Sequel
Sequel
Converse with your database using natural language
·sequel.sh·
Sequel
Understanding Data and Metadata - Role and Key Differences
Understanding Data and Metadata - Role and Key Differences
Explore the intricacies of data and metadata, their key differences and the importance of metadata management tools such as dbForge Documenter.
·blog.devart.com·
Understanding Data and Metadata - Role and Key Differences
SQL Server "Codify" Function
SQL Server "Codify" Function
This function will jump-start the process of converting long descriptions into meaningful abbreviations. It's great for creating "Code" columns in lookup tables.
·nolongerset.com·
SQL Server "Codify" Function
Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro
Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro
The reality is that all organisations possess data quality rules but they’re typically scattered widely across the organisation with no thought to standardisation, governance and re-use. The following resources will help your organisation buck that trend adopt data quality rules management habits a
·dataqualitypro.com·
Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro
Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy
Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy
Data Modeling is a topic that never goes away. Sometimes I do reminisce about the good ol’ days of Kimball-style data models, it was so simple, straightforward, just the same thing for years. Then Big Data happened, Spark happened. Things just changed. There is a lot of new content coming out around Data Lakes and […]
·confessionsofadataguy.com·
Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy
Add or connect a database with WSL
Add or connect a database with WSL
Learn how to set up MySQL MongoDB, PostgreSQL, SQLite, Microsoft SQL Server, or Redis on the Windows Subsystem for Linux.
·docs.microsoft.com·
Add or connect a database with WSL