Data Engineering

Data Engineering

87 bookmarks
Newest
How to make data pipelines idempotent
How to make data pipelines idempotent
Unable to find practical examples of idempotent data pipelines? Then, this post is for you. In this post, we go over a technique that you can use to make your data pipelines professional and data reprocessing a breeze.
·startdataengineering.com·
How to make data pipelines idempotent
Data Pipeline Design Patterns - #1. Data flow patterns
Data Pipeline Design Patterns - #1. Data flow patterns
Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer value (allowing devs to do more in less time), decrease toil during pipeline failures, and build trust with stakeholders. This post goes over the most commonly used data flow design patterns, what they do, when to use them, and, more importantly, when not to use them. By the end of this post, you will have an overview of the typical data flow patterns and be able to choose the right one for your use case.
·startdataengineering.com·
Data Pipeline Design Patterns - #1. Data flow patterns
PostgreSQL Generated Columns
PostgreSQL Generated Columns
In this tutorial, you will learn about PostgreSQL generated columns whose values are automatically calculated from other columns.
In PostgreSQL, a generated column is a special type of column whose values are automatically calculated based on expressions or values from other columns. A generated column is referred to as a computed column in the SQL Server or a virtual column in Oracle .
There are two kinds of generated columns: Stored: A stored generated column is calculated when it is inserted or updated and occupies storage space. Virtual: A virtual generated column is computed when it is read and does not occupy storage space.
A virtual generated column is like a view, whereas a stored generated column is similar to a materialized view. Unlike a material view, PostgreSQL automatically updates data for stored generated columns.
PostgreSQL currently implements only stored generated columns.
·neon.tech·
PostgreSQL Generated Columns
PostgreSQL Sequences
PostgreSQL Sequences
In this tutorial, you will learn about the PostgreSQL sequences and how to use a sequence object to generate a sequence of numbers.
In PostgreSQL, a sequence is a database object that allows you to generate a sequence of unique integers. Typically, you use a sequence to generate a unique identifier for a primary key in a table. Additionally, you can use a sequence to generate unique numbers across tables. To create a new sequence, you use the CREATE SEQUENCE statement.
Listing all sequences in a database To list all sequences in the current database, you use the following query: SELECT relname sequence_name FROM pg_class WHERE relkind = 'S';
·neon.tech·
PostgreSQL Sequences
PostgreSQL Identity Column
PostgreSQL Identity Column
This tutorial shows you how to use the GENERATED AS IDENTITY constraint to create the PostgreSQL identity column for a table.
PostgreSQL version 10 introduced a new constraint GENERATED AS IDENTITY that allows you to automatically assign a unique number to a column.
The GENERATED AS IDENTITY constraint is the SQL standard-conforming variant of the good old SERIAL column.
The following illustrates the syntax of the GENERATED AS IDENTITY constraint: column_name type GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY[ ( sequence_option ) ]
In this syntax: The type can be SMALLINT, INT, or BIGINT. The GENERATED ALWAYS instructs PostgreSQL to always generate a value for the identity column. If you attempt to insert (or update) values into the GENERATED ALWAYS AS IDENTITY column, PostgreSQL will issue an error. The GENERATED BY DEFAULT instructs PostgreSQL to generate a value for the identity column. However, if you supply a value for insert or update, PostgreSQL will use that value to insert into the identity column instead of using the system-generated value.
PostgreSQL allows a table to have more than one identity column. Like the SERIAL, the GENERATED AS IDENTITY constraint also uses the SEQUENCE object internally.
To fix the error, you can use the OVERRIDING SYSTEM VALUE clause as follows: INSERT INTO color (color_id, color_name) OVERRIDING SYSTEM VALUE VALUES(2, 'Green');
Alternatively, you can use GENERATED BY DEFAULT AS IDENTITY instead.
Because the GENERATED AS IDENTITY constraint uses the SEQUENCE object, you can specify the sequence options for the system-generated values.
For example, you can specify the starting value and the increment as follows: DROP TABLE color; CREATE TABLE color ( color_id INT GENERATED BY DEFAULT AS IDENTITY (START WITH 10 INCREMENT BY 10), color_name VARCHAR NOT NULL );
In this example, the system-generated value for the color_id column starts with 10 and the increment value is also 10.
·neon.tech·
PostgreSQL Identity Column
AI Database Design Flowchart Generator
AI Database Design Flowchart Generator
Unlock efficient database design with our AI-powered Database Design Flowchart Generator! Experience fast, accurate, and intuitive creation of complex database schemas. Save time, reduce errors, and streamline your workflow — start designing smarter today!
·taskade.com·
AI Database Design Flowchart Generator
Schema-driven development in 2021 - 99designs
Schema-driven development in 2021 - 99designs
Schema-driven development is an important concept to know in 2021. What exactly is schema-driven development? What are the benefits of schema-driven development? We will explore the answers to these questions in this article.
·99designs.com·
Schema-driven development in 2021 - 99designs
Sequel
Sequel
Converse with your database using natural language
·sequel.sh·
Sequel
Understanding Data and Metadata - Role and Key Differences
Understanding Data and Metadata - Role and Key Differences
Explore the intricacies of data and metadata, their key differences and the importance of metadata management tools such as dbForge Documenter.
·blog.devart.com·
Understanding Data and Metadata - Role and Key Differences
SQL Server "Codify" Function
SQL Server "Codify" Function
This function will jump-start the process of converting long descriptions into meaningful abbreviations. It's great for creating "Code" columns in lookup tables.
·nolongerset.com·
SQL Server "Codify" Function
Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro
Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro
The reality is that all organisations possess data quality rules but they’re typically scattered widely across the organisation with no thought to standardisation, governance and re-use. The following resources will help your organisation buck that trend adopt data quality rules management habits a
·dataqualitypro.com·
Data Quality Rules: The Definitive Guide to Getting Started — Data Quality Pro
Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy
Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy
Data Modeling is a topic that never goes away. Sometimes I do reminisce about the good ol’ days of Kimball-style data models, it was so simple, straightforward, just the same thing for years. Then Big Data happened, Spark happened. Things just changed. There is a lot of new content coming out around Data Lakes and […]
·confessionsofadataguy.com·
Data Modeling - Relational Databases (SQL) vs Data Lake (File Based) - Confessions of a Data Guy
Add or connect a database with WSL
Add or connect a database with WSL
Learn how to set up MySQL MongoDB, PostgreSQL, SQLite, Microsoft SQL Server, or Redis on the Windows Subsystem for Linux.
·docs.microsoft.com·
Add or connect a database with WSL