Skip to main content

Effortless Data Pipelines: Simplify Transformations with Snowflake Dynamic Tables

snowflake Dynamic table

TL;DR: Dynamic tables are tables with in-built syncing capability. On the dynamic table definition you define the source tables, what data to sync and how often to sync, after that Snowflake takes care of data syncing process automatically.

Dynamic tables are used to transform data in a reliable, and automated way.
You define the source table, write SQL query to select what data to sync,and set frequency to sync. Snowflake takes care of the rest. When the underlying data on which the dynamic table is based on changes, the table is updated to reflect those changes. These updates are automated and referred to as a “refresh”.

Source of the dynamic table can be multiple tables and even another dynamic table.

Main benefits of dynamic tables:

  • Declarative programming: You can define define the end state using declarative SQL and let Snowflake handle the pipeline management.

  • Easy switching: Transition seamlessly from batch to streaming with a single ALTER DYNAMIC TABLE command. You control how often data is refreshed in your pipeline, which helps balance cost and data freshness.

Note: A dynamic table’s content is based on the query, and it can’t be modified using DML operations.

How dynamic tables work

Dynamic tables rely on change tracking on base tables to notice changes, if change tracking is not already enabled on the base tables, Snowflake automatically attempts to enable change tracking on them. The user creating the dynamic table must have the OWNERSHIP privilege to enable change tracking on all underlying objects/tables.

Change tracking
Specifies whether to enable change tracking on the table. It enables change tracking on the table. This setting adds a pair of hidden columns to the source table and begins storing change-tracking metadata in the columns. These columns consume a small amount of storage.

The change-tracking metadata can be queried using the CHANGES clause for SELECT statements, or by creating and querying one or more streams on the table.

You can enable change tracking either while creating the table or using Alter table command:

CREATE OR REPLACE products
...
CHANGE_TRACKING = TRUE
;

or

ALTER TABLE products SET CHANGE_TRACKING = TRUE;

To check if change tracking is enabled on base tables, use SHOW VIEWS, SHOW TABLES etc, commands on the base objects, and inspect the change_tracking column.

Syncing data between source table and dynamic tables operates in one of two ways:

  • Incremental refresh: This automated process analyzes the dynamic table’s query and calculates changes since the last refresh (syncing). It then merges these changes into the table.
  • Full refresh: When the automated process can’t perform an incremental refresh, it conducts a full refresh: completely replacing the data.

Note: A dynamic table’s refresh mode is determined at creation time and is immutable afterward.

Target lag:
You can define either the data should be synced as soon as the source table changes (within a minute) or certain time after the source table changes. It is called ‘target lag’. Target lag is the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables. By setting target lag it is possible to optimize cost efficiency. For example, if the target lag is set to 2 hours, on the first 30 minutes 10 new rows were inserted into the base table, and after another another 30 rows were inserted. This way instead of performing syncing process twice, every time the data changes, syncing process was performed only once, within 2 hours of the first changes. Note it is different from Cron scheduling, this 2 hour lag time does not mean syncing happens every 2 hours, rather it means syncing happens within 2 hours of first changes.

Below example sets the orders dynamic table’s refresh lag to two hours:

ALTER DYNAMIC TABLE orders SET TARGET_LAG = '2 hour';

Note:
It is possible to set up data transformation pipeline with Streams and Tasks, but setting up Dynamic Tables is much easier compared to using Streams and Tasks

Creating dynamic table

CREATE OR REPLACE DYNAMIC TABLE order
  TARGET_LAG = '30 minutes'
  WAREHOUSE = warehouse_xl
  REFRESH_MODE = INCREMENTAL
  AS
    SELECT order_id, order_source
    FROM staging_table
    WHERE order_origin = 'America';

Dynamic tables performance

Dynamic tables are intentionally designed to be simple: easy to create, use, and manage.

Note:
When queried, dynamic tables perform similarly to regular Snowflake tables.
The more complex the query, more time and recourse it takes

Access control

To have a full access (create, use, alter, monitor etc) on dynamic tables you must have either OWNERSHIP role or specific privileges based on the use case:

  • to create

    • CREATE DYNAMIC TABLE privilege on schema in which you plan to create the dynamic table
    • SELECT privilege on existing tables and views that you plan to query for the new dynamic table
    • USAGE privilege on database, schema and warehouse that you plan to use for the new dynamic table
  • to query

    • USAGE privilege on database, schema and warehouse that contains the dynamic table
    • SELECT privilege on the dynamic table being queried
  • to alter

    • OPERATE privilege on the dynamic table
  • to view a dynamic table’s metadata

    • MONITOR privilege on the dynamic table
      Note: OPERATE privilege allows Monitoring metadata, but it also allows altering dynamic table.

Note:
Only a user with OWNERSHIP privilege on the dynamic table can drop that dynamic table.

Cost

Compute cost

There are two compute costs associated with dynamic tables: virtual warehouses and Cloud Services compute.

  • Virtual warehouses:
    Dynamic tables require virtual warehouses to refresh - that is, run queries against base objects when they are initialized and refreshed, including both scheduled and manual refreshes. These operations use compute resources, which consume credits.

  • Cloud Services compute:
    Dynamic tables also require Cloud Services compute to identify changes in underlying base objects and whether the virtual warehouse needs to be invoked. If no changes are identified, virtual warehouse credits aren’t consumed since there’s no new data to refresh.

Note:
There may be instances where changes in base objects are filtered out in the dynamic table query. Even though dynamic table is not updated since the changes haven’t passed the where/having clause, virtual warehouse credits are consumed because the dynamic table undergoes a refresh to determine whether the changes are applicable.

Storage Cost

Dynamic tables require storage to store the materialized results. Similar to regular tables, you may incur additional storage cost for Time Travel, fail-safe storage, and cloning feature.

Limitations

  • A dynamic table’s content is based on the query, and it can’t be modified using DML operations.
  • You can’t truncate data from a dynamic table.
  • Dynamic tables can become stale if they are not refreshed within the MAX_DATA_EXTENSION_TIME_IN_DAYS period of the input tables. Once stale, dynamic tables must be recreated resume refreshes.

Dynamic Table vs Streams and Tasks

It is possible to set up data transformation pipeline with Streams and Tasks, but setting up Dynamic Tables is much easier compared to using Streams and Tasks.

Dynamic Table Stream & Task
A declarative approach: You define end result you want and Snowflake takes care of its implementation An imperative approach: You define how to transform the data step-by-step.
Dynamic syncing: syncing takes place when the data changes on the base/source table You define a schedule for executing the code that transforms the data. (Scheduling the task)
Cannot use stored procedure and other tasks Can use stored procedure and other tasks

Comments

Popular posts from this blog

脱初心者! Git ワークフローを理解して開発効率アップ

Git – チーム開発に必須のバージョン管理システムですが、その真価を発揮するにはワークフローの理解が欠かせません。 色々な人は Git の使い方を良く知っていますが、Git を仕事やワークフローに統合する方法を余り良く知らない人もいます。本記事では、Git をワークフローに組み込むことで、開発プロセスがどのように効率化され、チーム全体のパフォーマンスが向上するのかを解説します。Centralized Workflow から Forking Workflow まで、代表的な 9 つのワークフローの特徴を分かりやすく紹介します。それぞれのメリット・デメリット、そして最適なユースケースを理解することで、あなたのプロジェクトに最適なワークフローを選択し、開発をスムーズに進めましょう! Centralized Workflow Feature branching/GitHub Flow Trunk Based Flow Git Feature Flow Git Flow Enhanced Git Flow One Flow GitLab Flow Forking Workflow 分かりやすくするために、同じコンセプトを説明するに一つ以上の図を使った場合があります。 Centralized Workflow 説明: 集中化ワークフローではプロジェクトにおけるすべての変更の単一の入力箇所として中央リポジトリを使用します。デフォルトの開発用ブランチは main と呼ばれ、すべての変更がこのブランチにコミットされます。 集中化ワークフローでは main 以外のブランチは不要です。チームメンバー全員がひとつのブランチで作業し、変更を直接中央リポジトリにプッシュします。 メリット: SVN のような集中型バージョン管理システムから移行する小規模チームに最適。 デメリット: お互いのコードが邪魔になり (お互いの変更を上書きするように)、プロダクション環境にバグをい入れる可能性が高くて、複数のメンバいるチームでこのフローを使いにくい。 地図: graph TD; A[Central Repository] -->|Clone| B1[Developer A's Local Repo] A --...

From Generic to Genius: Fine-tuning LLMs for Superior Accuracy in Snowflake

TL;DR: Cortex Fine-tuning is a fully managed service that lets you fine-tune popular LLMs using your data, all within Snowflake. While large language models (LLMs) are revolutionizing various fields, their "out-of-the-box" capabilities might not always align perfectly with your specific needs. This is where the power of fine-tuning comes into play. As it will be explained in this article, this feature empowers you to take a base LLM and customize it to excel in your particular domain. Here's the brief summary of why you might want to leverage Snowflake's fine-tuning capabilities: Unlocking Domain Expertise : Pre-trained LLMs are trained on massive, general datasets. Fine-tuning allows you to build upon this foundation and train the LLM further using data specific to your field, such as legal documents, medical records, or financial data. This empowers the LLM to understand complex terminology and patterns unique to your domain, leading to more accurate a...

How Wendy’s Successfully Penetrated the Japanese Market After Long Struggles

Wendy’s had long struggled to penetrate the Japanese market. Initially the Daiei Group tried to bring Wendy’s to Japan but failed. The next owner of Wendy’s’ Japanese franchise, Zensho Holdings Co. also failed miserably. However, Japanese-American entrepreneur Ernest M. Higa seems to have managed to do the task. This article will discuss the challenges Wendy’s faced when entering the Japanese market, how Ernie Higa addressed those challenges, macro environmental factors that impacted the success of the brand in Japan, future threats the Japanese fast food market is facing , and potential solutions. The prior challenges that Wendy’s faced when they entered the Japanese market There is no one-size-fits-all formula in business, especially when Japan is involved in the conversation. According to Japanese-American entrepreneur Ernie Higa, even if a company has a good product and good pricing, penetrating the Japanese market is more difficult compared to the US’s market. Foreign e...