Skip to main content

Effortless Data Pipelines: Simplify Transformations with Snowflake Dynamic Tables

snowflake Dynamic table

TL;DR: Dynamic tables are tables with in-built syncing capability. On the dynamic table definition you define the source tables, what data to sync and how often to sync, after that Snowflake takes care of data syncing process automatically.

Dynamic tables are used to transform data in a reliable, and automated way.
You define the source table, write SQL query to select what data to sync,and set frequency to sync. Snowflake takes care of the rest. When the underlying data on which the dynamic table is based on changes, the table is updated to reflect those changes. These updates are automated and referred to as a “refresh”.

Source of the dynamic table can be multiple tables and even another dynamic table.

Main benefits of dynamic tables:

  • Declarative programming: You can define define the end state using declarative SQL and let Snowflake handle the pipeline management.

  • Easy switching: Transition seamlessly from batch to streaming with a single ALTER DYNAMIC TABLE command. You control how often data is refreshed in your pipeline, which helps balance cost and data freshness.

Note: A dynamic table’s content is based on the query, and it can’t be modified using DML operations.

How dynamic tables work

Dynamic tables rely on change tracking on base tables to notice changes, if change tracking is not already enabled on the base tables, Snowflake automatically attempts to enable change tracking on them. The user creating the dynamic table must have the OWNERSHIP privilege to enable change tracking on all underlying objects/tables.

Change tracking
Specifies whether to enable change tracking on the table. It enables change tracking on the table. This setting adds a pair of hidden columns to the source table and begins storing change-tracking metadata in the columns. These columns consume a small amount of storage.

The change-tracking metadata can be queried using the CHANGES clause for SELECT statements, or by creating and querying one or more streams on the table.

You can enable change tracking either while creating the table or using Alter table command:

CREATE OR REPLACE products
...
CHANGE_TRACKING = TRUE
;

or

ALTER TABLE products SET CHANGE_TRACKING = TRUE;

To check if change tracking is enabled on base tables, use SHOW VIEWS, SHOW TABLES etc, commands on the base objects, and inspect the change_tracking column.

Syncing data between source table and dynamic tables operates in one of two ways:

  • Incremental refresh: This automated process analyzes the dynamic table’s query and calculates changes since the last refresh (syncing). It then merges these changes into the table.
  • Full refresh: When the automated process can’t perform an incremental refresh, it conducts a full refresh: completely replacing the data.

Note: A dynamic table’s refresh mode is determined at creation time and is immutable afterward.

Target lag:
You can define either the data should be synced as soon as the source table changes (within a minute) or certain time after the source table changes. It is called ‘target lag’. Target lag is the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables. By setting target lag it is possible to optimize cost efficiency. For example, if the target lag is set to 2 hours, on the first 30 minutes 10 new rows were inserted into the base table, and after another another 30 rows were inserted. This way instead of performing syncing process twice, every time the data changes, syncing process was performed only once, within 2 hours of the first changes. Note it is different from Cron scheduling, this 2 hour lag time does not mean syncing happens every 2 hours, rather it means syncing happens within 2 hours of first changes.

Below example sets the orders dynamic table’s refresh lag to two hours:

ALTER DYNAMIC TABLE orders SET TARGET_LAG = '2 hour';

Note:
It is possible to set up data transformation pipeline with Streams and Tasks, but setting up Dynamic Tables is much easier compared to using Streams and Tasks

Creating dynamic table

CREATE OR REPLACE DYNAMIC TABLE order
  TARGET_LAG = '30 minutes'
  WAREHOUSE = warehouse_xl
  REFRESH_MODE = INCREMENTAL
  AS
    SELECT order_id, order_source
    FROM staging_table
    WHERE order_origin = 'America';

Dynamic tables performance

Dynamic tables are intentionally designed to be simple: easy to create, use, and manage.

Note:
When queried, dynamic tables perform similarly to regular Snowflake tables.
The more complex the query, more time and recourse it takes

Access control

To have a full access (create, use, alter, monitor etc) on dynamic tables you must have either OWNERSHIP role or specific privileges based on the use case:

  • to create

    • CREATE DYNAMIC TABLE privilege on schema in which you plan to create the dynamic table
    • SELECT privilege on existing tables and views that you plan to query for the new dynamic table
    • USAGE privilege on database, schema and warehouse that you plan to use for the new dynamic table
  • to query

    • USAGE privilege on database, schema and warehouse that contains the dynamic table
    • SELECT privilege on the dynamic table being queried
  • to alter

    • OPERATE privilege on the dynamic table
  • to view a dynamic table’s metadata

    • MONITOR privilege on the dynamic table
      Note: OPERATE privilege allows Monitoring metadata, but it also allows altering dynamic table.

Note:
Only a user with OWNERSHIP privilege on the dynamic table can drop that dynamic table.

Cost

Compute cost

There are two compute costs associated with dynamic tables: virtual warehouses and Cloud Services compute.

  • Virtual warehouses:
    Dynamic tables require virtual warehouses to refresh - that is, run queries against base objects when they are initialized and refreshed, including both scheduled and manual refreshes. These operations use compute resources, which consume credits.

  • Cloud Services compute:
    Dynamic tables also require Cloud Services compute to identify changes in underlying base objects and whether the virtual warehouse needs to be invoked. If no changes are identified, virtual warehouse credits aren’t consumed since there’s no new data to refresh.

Note:
There may be instances where changes in base objects are filtered out in the dynamic table query. Even though dynamic table is not updated since the changes haven’t passed the where/having clause, virtual warehouse credits are consumed because the dynamic table undergoes a refresh to determine whether the changes are applicable.

Storage Cost

Dynamic tables require storage to store the materialized results. Similar to regular tables, you may incur additional storage cost for Time Travel, fail-safe storage, and cloning feature.

Limitations

  • A dynamic table’s content is based on the query, and it can’t be modified using DML operations.
  • You can’t truncate data from a dynamic table.
  • Dynamic tables can become stale if they are not refreshed within the MAX_DATA_EXTENSION_TIME_IN_DAYS period of the input tables. Once stale, dynamic tables must be recreated resume refreshes.

Dynamic Table vs Streams and Tasks

It is possible to set up data transformation pipeline with Streams and Tasks, but setting up Dynamic Tables is much easier compared to using Streams and Tasks.

Dynamic Table Stream & Task
A declarative approach: You define end result you want and Snowflake takes care of its implementation An imperative approach: You define how to transform the data step-by-step.
Dynamic syncing: syncing takes place when the data changes on the base/source table You define a schedule for executing the code that transforms the data. (Scheduling the task)
Cannot use stored procedure and other tasks Can use stored procedure and other tasks

Comments

Popular posts from this blog

Introduction to SQLFluff: How to make your SQL code clean and error-free

Image by Jake Aldridge from Pixabay You know oftentimes, the cause of runtime or compile errors and hours of debugging agony is all due to simply a missing semicolon. Have you ever had such experience? If you had, you are not alone. There are two ways to avoid these unfortunate situations: either become a perfect developer who never makes mistakes, or use helpful tools such as linters that can catch these errors early on. I am nowhere near being a perfect developer who never makes a mistake. In fact, I'm probably the opposite of a perfect developer, so even if I wanted to, I wouldn’t be able to teach you how to become a perfect developer. But what I can teach you is using linters. A Wikipedia defines a linter as a "static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs." If you're not convinced yet on using linters, consider this scenario: in a large project with multiple members, different people tend to ...

WinMerge のセットアップと使う方

WinMerge は、Windows 用のオープン ソースの差分およびマージ ツールです。WinMerge は、フォルダーとファイルの両方を比較し、違いを理解して扱いやすい視覚的なテキスト形式で表示します。この記事でWinMerge のセットアップと使う方を教えます。 source: https://winmerge.org WinMerge をダウンロード WinMerge のウェブサイト に行って、「WinMerge-2.16.44-x64-Setup.exe」ボタンを押し、WinMerge 2.16 をダウンロードしてください。 WinMerge をインストール ダウンロードされたソフトウェアをクリックし、ポップアップ画面で「Next」を押してください 「Languages」部分をスクロールダウンし、「Japanese menus and dialogs」を選択し、「Next」ボタンを押してください ターミナル等からも WinMerge をアクセス出来ようにする為に「Add WinMerge folder to your system path」オプションを選択し、希望によって他のオプション選択してください 「Enable Explorer context menu Integration」オプションを選択したら、フォルダ/ファイルを右キリックし、コンテクストメニューから WinMerge を開くようになります。 「Install」ボタンを押し、「Next」ボタンを押し、その後、「Finish」ボタンを押してください 言語を日本語にする もし WinMerge の言語が日本語じゃなくて、英語なら、「Edit」タブから「Options」を押してください。 ポップアップ画面で右側の下にある「Languages」と言うドロップダウンメニューから日本語を選択し、「OK」ボタンを押してください WinMerge を使う方 「ファイル」タッブから「開く」を押し 参照ボタンを押し、比較したいフォルダ・ファイルを指定 比較したいフォルダを指定する方法: ポップアップ画面から対象のフォルダーを選択し、「Open」を押してくだい 何も選択しないで、「Open」を押してくだい 右側下にある「比較」ボタンを押し ...

Git squash merge explained

There are many ways to integrate changes in git: regular / normal git merge, git squash merge, git rebase etc. This article explains git squash merge by comparing it to regular merge. Let’s use below example: In the repository with default main branch, after two commits, a new feature branch is created. Some work happened in feature branch. feature branch now has 2 commits that it shares with main branch, and three exclusive commits (exists only in feature branch). In the meantime, others worked on main branch and added two new commits (exists only in main branch). git log output of the main branch: c72d4a9 ( HEAD - > main ) fourth commit on main 2c3dd61 third commit on main 0c2eec3 second commit on main 9b968e8 first commit on main git log output of the feature branch: 786650f ( HEAD - > feature ) third commit on feature 21cbaf1 second commit on feature 677bc7f first commit on feature 0c2eec3 second commit on main 9b968e8 first commit on mai...