DP-203 References

113 bookmarks

Newest

Visually monitor Azure Data Factory - Azure Data Factory

Learn how to visually monitor Azure data factories

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Visually monitor Azure Data Factory - Azure Data Factory

Create tumbling window triggers - Azure Data Factory & Azure Synapse

Learn how to create a trigger in Azure Data Factory or Azure Synapse Analytics that runs a pipeline on a tumbling window.

Tumbling window triggers are a type of trigger that fires at a periodic time interval from a specified start time, while retaining state. Tumbling windows are a series of fixed-sized, non-overlapping, and contiguous time intervals. A tumbling window trigger has a one-to-one relationship with a pipeline and can only reference a singular pipeline.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Create tumbling window triggers - Azure Data Factory & Azure Synapse

Create event-based triggers - Azure Data Factory & Azure Synapse

Learn how to create a trigger in an Azure Data Factory or Azure Synapse Analytics that runs a pipeline in response to an event.

Data integration scenarios often require customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. Data Factory and Synapse pipelines natively integrate with Azure Event Grid, which lets you trigger pipelines on such events.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Create event-based triggers - Azure Data Factory & Azure Synapse

Understand inputs for Azure Stream Analytics

This article describe the concept of inputs in an Azure Stream Analytics job, comparing streaming input to reference data input.

Azure Blob storage, Azure Data Lake Storage Gen2, and Azure SQL Database are currently supported as input sources for reference data.

Event Hubs, IoT Hub, Azure Data Lake Storage Gen2 and Blob storage are supported as data stream input sources.

A data stream is an unbounded sequence of events over time. Stream Analytics jobs must include at least one data stream input.

Reference data is either completely static or changes slowly. It is typically used to perform correlation and lookups.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Understand inputs for Azure Stream Analytics

Choose a real-time and stream processing solution on Azure

Learn about how to choose the right real-time analytics and streaming processing technology to build your application on Azure.

Built-in temporal operators, such as windowed aggregates, temporal joins, and temporal analytic functions. Native Azure input and output adapters Support for slow changing reference data (also known as a lookup tables), including joining with geospatial reference data for geofencing. Integrated solutions, such as Anomaly Detection Multiple time windows in the same query Ability to compose multiple temporal operators in arbitrary sequences.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Choose a real-time and stream processing solution on Azure

Process real-time IoT data streams with Azure Stream Analytics

IoT sensor tags and data streams with stream analytics and real-time data processing

Stream Analytics Query Language (SAQL)

How to write different Queries in Azure Stream Analytics

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Process real-time IoT data streams with Azure Stream Analytics

Tutorial - Analyze fraudulent call data with Azure Stream Analytics and visualize results in Power BI dashboard

This tutorial provides an end-to-end demonstration of how to use Azure Stream Analytics to analyze fraudulent calls in a phone call stream.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Tutorial - Analyze fraudulent call data with Azure Stream Analytics and visualize results in Power BI dashboard

Lookup transformations in mapping data flow - Azure Data Factory & Azure Synapse

Reference data from another source using lookup transformations in mapping data flow for Azure Data Factory and Synapse Analytics pipelines.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Lookup transformations in mapping data flow - Azure Data Factory & Azure Synapse

Conditional split transformation in mapping data flow - Azure Data Factory & Azure Synapse

Split data into different streams using the conditional split transformation in a mapping data flow in Azure Data Factory or Synapse Analytics

The conditional split transformation routes data rows to different streams based on matching conditions. The conditional split transformation is similar to a CASE decision structure in a programming language.

disjoint is false because the data goes to the first matching condition.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Conditional split transformation in mapping data flow - Azure Data Factory & Azure Synapse

Prepare and transform data with Azure Synapse Analytics - Learn

Prepare and transform data with Azure Synapse Analytics

Azure Blob Storage (JSON, Avro, Text, Parquet) Azure Data Lake Storage Gen1 (JSON, Avro, Text, Parquet) Azure Data Lake Storage Gen2 (JSON, Avro, Text, Parquet) Azure Synapse Analytics Azure SQL Database Azure CosmosDB

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Prepare and transform data with Azure Synapse Analytics - Learn

Filter transformation in mapping data flow - Azure Data Factory & Azure Synapse

Filter out rows using the filter transformation in a mapping data flow in Azure Data Factory or Synapse Analytics.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Filter transformation in mapping data flow - Azure Data Factory & Azure Synapse

Exercise - Author an Azure Data Factory mapping data flow - Learn

Exercise - Author an Azure Data Factory mapping data flow

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Exercise - Author an Azure Data Factory mapping data flow - Learn

Flatten transformation in mapping data flow - Azure Data Factory & Azure Synapse

Denormalize hierarchical data using the flatten transformation in Azure Data Factory and Synapse Analytics pipelines.

Use the flatten transformation to take array values inside hierarchical structures such as JSON and unroll them into individual rows. This process is known as denormalization.

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Flatten transformation in mapping data flow - Azure Data Factory & Azure Synapse

Work with DataFrames advanced methods in Azure Databricks - Learn

Work with DataFrames advanced methods in Azure Databricks

02 - Design and Develop Data Processing

·docs.microsoft.com·Apr 10, 2022

Work with DataFrames advanced methods in Azure Databricks - Learn

6 Practical Data Protection Features in SQL Server (Pros & Cons)

Here's how we use the data protection features within SQL Server to protect confidential data and make it available to only those authorized to see it.

Adding a dynamic data mask to a column in SQL Server blocks out part of the information column. This is useful if an employee needs to see only part of some sort of ID number or even part of a phone number for verification purposes.

Transparent data encryption is a feature that encrypts any data that is being saved to the hard drive/disk. So, if any data is updated in a table, that data becomes transparently encrypted immediately upon save. When the data is pulled back, SQL Server will unencrypt it for you.

This satisfies the regulatory requirement that any data “at rest” be encrypted. All data within the database is encrypted.

Encrypted Columns in SQL Server are columns within a table that have been encrypted to hide whatever sensitive data the column contains. This is a good way to both hide sensitive data like a social security number or a date of birth and have the data encrypted “at rest”. To read the data, special permissions are needed to access the necessary keys.

The data is encrypted so this satisfies any sort of regulatory requirement of “encrypting data at rest”.

Always Encrypted is useful if the people working in the database are not always authorized to see the data inside the database (dates of birth, SSN’s, salaries).

The Always Encrypted feature even prevents those who manage the database from accessing or decrypting sensitive data, while still allowing end users to read and interact with the same data. In this case, a web or desktop application is set up to encrypt or decrypt the data without the SQL Server being able to read the data.

01 - Design and Implement Storage

·corebts.com·Apr 9, 2022

6 Practical Data Protection Features in SQL Server (Pros & Cons)

Introducing data virtualization with PolyBase - SQL Server

PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources such as Hadoop and Azure blob storage.

PolyBase enables your SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, Hadoop clusters, Cosmos DB without separately installing client connection software. You can also use the generic ODBC connector to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of SQL Server.

A key use case for data virtualization with the PolyBase feature is to allow the data to stay in its original location and format. You can virtualize the external data through the SQL Server instance, so that it can be queried in place like any other table in SQL Server. This process minimizes the need for ETL processes for data movement.

Query data stored in Hadoop from a SQL Server instance or PDW.

Query data stored in Azure blob storage.

Import data from Hadoop, Azure blob storage, or Azure Data Lake Store.

Export data to Hadoop, Azure blob storage, or Azure Data Lake Store

Integrate with BI tools.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Introducing data virtualization with PolyBase - SQL Server

Use external tables with Synapse SQL - Azure Synapse Analytics

Reading or writing data files with external tables in Synapse SQL

·docs.microsoft.com·Apr 8, 2022

Use external tables with Synapse SQL - Azure Synapse Analytics

Parent-Child Dimensions

Learn about parent-child hierarchies, which are hierarchies in a standard dimension that contain a parent attribute.

In this dimension table, the ParentOrganizationKey column has a foreign key relationship with the OrganizationKey primary key column. In other words, each record in this table can be related through a parent-child relationship with another record in the table. This kind of self-join is generally used to represent organization entity data, such as the management structure of employees in a department.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Parent-Child Dimensions

Preserve metadata and ACLs using copy activity - Azure Data Factory & Azure Synapse

Learn how to preserve metadata and ACLs when using the copy activity in Azure Data Factory and Synapse Analytics pipelines.

·docs.microsoft.com·Apr 8, 2022

Preserve metadata and ACLs using copy activity - Azure Data Factory & Azure Synapse

Best practices for using Azure Data Lake Storage Gen2

Learn how to optimize performance, reduce costs, and secure your Data Lake Storage Gen2 enabled Azure Storage account.

The network connectivity between your source data and your storage account can sometimes be a bottleneck. When your source data is on premise, consider using a dedicated link with Azure ExpressRoute. If your source data is in Azure, the performance is best when the data is in the same Azure region as your Data Lake Storage Gen2 enabled account.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Best practices for using Azure Data Lake Storage Gen2

Difference between Clustered and Non-clustered index - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

A Clustered index is a type of index in which table records are physically reordered to match the index. A Non-Clustered index is a special type of index in which logical order of index does not match physical stored order of the rows on disk.

Clustered index is faster. Non-clustered index is slower. Clustered index requires less memory for operations. Non-Clustered index requires more memory for operations.

01 - Design and Implement Storage

·geeksforgeeks.org·Apr 8, 2022

Difference between Clustered and Non-clustered index - GeeksforGeeks

Clustered and nonclustered indexes described - SQL Server

Clustered and nonclustered indexes described

An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Clustered and nonclustered indexes described - SQL Server

Columnstore indexes: Overview - SQL Server

Columnstore indexes: Overview

Columnstore indexes are the standard for storing and querying large data warehousing fact tables.

A nonclustered columnstore index and a clustered columnstore index function the same. The difference is that a nonclustered index is a secondary index that's created on a rowstore table, but a clustered columnstore index is the primary storage for the entire table.

A clustered columnstore index is the physical storage for the entire table.

Use a clustered columnstore index to store fact tables and large dimension tables for data warehousing workloads. This method improves query performance and data compression by up to 10 times.

Use a nonclustered columnstore index to perform analysis in real time on an OLTP workload.

Rowstore indexes perform best on queries that seek into the data, when searching for a particular value, or for queries on a small range of values. Use rowstore indexes with transactional workloads because they tend to require mostly table seeks instead of table scans.

Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Columnstore indexes: Overview - SQL Server

Auto-failover groups overview & best practices - Azure SQL Database

Auto-failover groups let you manage geo-replication and automatic / coordinated failover of a group of databases on a server for both single and pooled database in Azure SQL Database.

The auto-failover groups feature allows you to manage the replication and failover of some or all databases on a logical server to another region. This article focuses on using the Auto-failover group feature with Azure SQL Database and some best practices.

The auto-failover groups feature allows you to manage the replication and failover of a group of databases on a server or all user databases in a managed instance to another Azure region.

By default, a failover group is configured with an automatic failover policy. The system triggers a geo-failover after the failure is detected and the grace period has expired.

Planned failover performs full data synchronization between primary and secondary databases before the secondary switches to the primary role.

You can initiate a geo-failover manually at any time regardless of the automatic failover configuration. During an outage that impacts the primary, if automatic failover policy is not configured, a manual failover is required to promote the secondary to the primary role.

Unplanned or forced failover immediately switches the secondary to the primary role without waiting for recent changes to propagate from the primary. This operation may result in data loss.

By default, the failover of the read-only listener is disabled. It ensures that the performance of the primary is not impacted when the secondary is offline. However, it also means the read-only sessions will not be able to connect until the secondary is recovered.

Because the data is replicated to the secondary database using asynchronous replication, an automatic geo-failover may result in data loss. You can customize the automatic failover policy to reflect your application’s tolerance to data loss. By configuring GracePeriodWithDataLossHours, you can control how long the system waits before initiating a forced failover, which may result in data loss.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Auto-failover groups overview & best practices - Azure SQL Database

Active geo-replication - Azure SQL Database

Use active geo-replication to create readable secondary databases of individual databases in Azure SQL Database in the same or different regions.

Active geo-replication is a feature that lets you to create a continuously synchronized readable secondary database for a primary database. The readable secondary database may be in the same Azure region as the primary, or, more commonly, in a different region. This kind of readable secondary databases are also known as geo-secondaries, or geo-replicas.

Unplanned geo-failover Unplanned, or forced, geo-failover immediately switches the geo-secondary to the primary role without any synchronization with the primary. Any transactions committed on the primary but not yet replicated to the secondary are lost. This operation is designed as a recovery method during outages when the primary is not accessible, but database availability must be quickly restored. When the original primary is back online, it will be automatically re-connected, reseeded using the current primary data, and become a new geo-secondary.

Planned geo-failover Planned geo-failover switches the roles of primary and geo-secondary databases after completing full data synchronization. A planned failover does not result in data loss.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Active geo-replication - Azure SQL Database

Introduction to Azure Storage - Cloud storage on Azure

The Azure Storage platform is Microsoft's cloud storage solution. Azure Storage provides highly available, secure, durable, massively scalable, and redundant storage for data objects in the cloud. Learn about the services available in Azure Storage and how you can use them in your applications, services, or enterprise solutions.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 8, 2022

Introduction to Azure Storage - Cloud storage on Azure

Transparent data encryption (TDE) - SQL Server

Learn about transparent data encryption, which encrypts SQL Server, Azure SQL Database, and Azure Synapse Analytics data, known as encrypting data at rest.

Transparent data encryption (TDE) encrypts SQL Server, Azure SQL Database, and Azure Synapse Analytics data files. This encryption is known as encrypting data at rest.

One solution is to encrypt sensitive data in a database and use a certificate to protect the keys that encrypt the data. This solution prevents anyone without the keys from using the data. But you must plan this kind of protection in advance.

TDE protects data at rest, which is the data and log files.

03 - Design and implement data security

·docs.microsoft.com·Apr 7, 2022

Transparent data encryption (TDE) - SQL Server

Choose a batch processing technology - Azure Architecture Center

Compare technology choices for big data batch processing in Azure, including key selection criteria and a capability matrix.

Azure Synapse is a distributed system designed to perform analytics on large data. It supports massive parallel processing (MPP), which makes it suitable for running high-performance analytics. Consider Azure Synapse when you have large amounts of data (more than 1 TB) and are running an analytics workload that will benefit from parallelism.

Languages: R, Python, Java, Scala, Spark SQL Fast cluster start times, autotermination, autoscaling. Manages the Spark cluster for you. Built-in integration with Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Synapse, and other services. See Data Sources. User authentication with Azure Active Directory. Web-based notebooks for collaboration and data exploration.

Azure Databricks is an Apache Spark-based analytics platform. You can think of it as "Spark as a service." It's the easiest way to use Spark on the Azure platform.

01 - Design and Implement Storage

·docs.microsoft.com·Apr 4, 2022

Choose a batch processing technology - Azure Architecture Center

Transparent data encryption - Azure SQL Database & SQL Managed Instance & Azure Synapse Analytics

An overview of transparent data encryption for Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. The document covers its benefits and the options for configuration, which includes service-managed transparent data encryption and Bring Your Own Key.

Transparent data encryption (TDE) helps protect Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics against the threat of malicious offline activity by encrypting data at rest. It performs real-time encryption and decryption of the database, associated backups, and transaction log files at rest without requiring changes to the application. By default, TDE is enabled for all newly deployed Azure SQL Databases and must be manually enabled for older databases of Azure SQL Database.

03 - Design and implement data security

·docs.microsoft.com·Apr 4, 2022

Transparent data encryption - Azure SQL Database & SQL Managed Instance & Azure Synapse Analytics

Temporal Tables - SQL Server

System-versioned temporal tables bring built-in support for providing information about data stored in the table at any point in time

A system-versioned temporal table is a type of user table designed to keep a full history of data changes, allowing easy point-in-time analysis.

Every temporal table has two explicitly defined columns, each with a datetime2 data type. These columns are referred to as period columns.

In addition to these period columns, a temporal table also contains a reference to another table with a mirrored schema, called the history table. The system uses the history table to automatically store the previous version of the row each time a row in the temporal table gets updated or deleted.

script

01 - Design and Implement Storage

·docs.microsoft.com·Apr 4, 2022

Temporal Tables - SQL Server