Introduction to designing tables using dedicated SQL pool.
A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table is designed to achieve high performance for queries on large tables.
A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables since joins on replicated tables don't require data movement. Replication requires extra storage, though, and isn't practical for large tables.
A round-robin table distributes table rows evenly across all distributions. The rows are distributed randomly. Loading data into a round-robin table is fast. Keep in mind that queries can require more data movement than the other distribution methods.
Staging Use round-robin for the staging table.
Dimension Use replicated for smaller tables. If tables are too large to store on each Compute node, use hash-distributed.
Fact Use hash-distribution with clustered columnstore index. Performance improves when two hash tables are joined on the same distribution column.
By default, dedicated SQL pool stores a table as a clustered columnstore index. This form of data storage achieves high data compression and query performance on large tables.
A heap table can be especially useful for loading transient data, such as a staging table which is transformed into a final table.
An external table points to data located in Azure Storage blob or Azure Data Lake Store. When used with the CREATE TABLE AS SELECT statement, selecting from an external table imports data into dedicated SQL pool.
A temporary table only exists for the duration of the session. You can use a temporary table to prevent other users from seeing temporary results and also to reduce the need for cleanup.