Skip to main content

Effortless Data Pipelines: Simplify Transformations with Snowflake Dynamic Tables

snowflake Dynamic table

TL;DR: Dynamic tables are tables with in-built syncing capability. On the dynamic table definition you define the source tables, what data to sync and how often to sync, after that Snowflake takes care of data syncing process automatically.

Dynamic tables are used to transform data in a reliable, and automated way.
You define the source table, write SQL query to select what data to sync,and set frequency to sync. Snowflake takes care of the rest. When the underlying data on which the dynamic table is based on changes, the table is updated to reflect those changes. These updates are automated and referred to as a “refresh”.

Source of the dynamic table can be multiple tables and even another dynamic table.

Main benefits of dynamic tables:

  • Declarative programming: You can define define the end state using declarative SQL and let Snowflake handle the pipeline management.

  • Easy switching: Transition seamlessly from batch to streaming with a single ALTER DYNAMIC TABLE command. You control how often data is refreshed in your pipeline, which helps balance cost and data freshness.

Note: A dynamic table’s content is based on the query, and it can’t be modified using DML operations.

How dynamic tables work

Dynamic tables rely on change tracking on base tables to notice changes, if change tracking is not already enabled on the base tables, Snowflake automatically attempts to enable change tracking on them. The user creating the dynamic table must have the OWNERSHIP privilege to enable change tracking on all underlying objects/tables.

Change tracking
Specifies whether to enable change tracking on the table. It enables change tracking on the table. This setting adds a pair of hidden columns to the source table and begins storing change-tracking metadata in the columns. These columns consume a small amount of storage.

The change-tracking metadata can be queried using the CHANGES clause for SELECT statements, or by creating and querying one or more streams on the table.

You can enable change tracking either while creating the table or using Alter table command:

CREATE OR REPLACE products
...
CHANGE_TRACKING = TRUE
;

or

ALTER TABLE products SET CHANGE_TRACKING = TRUE;

To check if change tracking is enabled on base tables, use SHOW VIEWS, SHOW TABLES etc, commands on the base objects, and inspect the change_tracking column.

Syncing data between source table and dynamic tables operates in one of two ways:

  • Incremental refresh: This automated process analyzes the dynamic table’s query and calculates changes since the last refresh (syncing). It then merges these changes into the table.
  • Full refresh: When the automated process can’t perform an incremental refresh, it conducts a full refresh: completely replacing the data.

Note: A dynamic table’s refresh mode is determined at creation time and is immutable afterward.

Target lag:
You can define either the data should be synced as soon as the source table changes (within a minute) or certain time after the source table changes. It is called ‘target lag’. Target lag is the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables. By setting target lag it is possible to optimize cost efficiency. For example, if the target lag is set to 2 hours, on the first 30 minutes 10 new rows were inserted into the base table, and after another another 30 rows were inserted. This way instead of performing syncing process twice, every time the data changes, syncing process was performed only once, within 2 hours of the first changes. Note it is different from Cron scheduling, this 2 hour lag time does not mean syncing happens every 2 hours, rather it means syncing happens within 2 hours of first changes.

Below example sets the orders dynamic table’s refresh lag to two hours:

ALTER DYNAMIC TABLE orders SET TARGET_LAG = '2 hour';

Note:
It is possible to set up data transformation pipeline with Streams and Tasks, but setting up Dynamic Tables is much easier compared to using Streams and Tasks

Creating dynamic table

CREATE OR REPLACE DYNAMIC TABLE order
  TARGET_LAG = '30 minutes'
  WAREHOUSE = warehouse_xl
  REFRESH_MODE = INCREMENTAL
  AS
    SELECT order_id, order_source
    FROM staging_table
    WHERE order_origin = 'America';

Dynamic tables performance

Dynamic tables are intentionally designed to be simple: easy to create, use, and manage.

Note:
When queried, dynamic tables perform similarly to regular Snowflake tables.
The more complex the query, more time and recourse it takes

Access control

To have a full access (create, use, alter, monitor etc) on dynamic tables you must have either OWNERSHIP role or specific privileges based on the use case:

  • to create

    • CREATE DYNAMIC TABLE privilege on schema in which you plan to create the dynamic table
    • SELECT privilege on existing tables and views that you plan to query for the new dynamic table
    • USAGE privilege on database, schema and warehouse that you plan to use for the new dynamic table
  • to query

    • USAGE privilege on database, schema and warehouse that contains the dynamic table
    • SELECT privilege on the dynamic table being queried
  • to alter

    • OPERATE privilege on the dynamic table
  • to view a dynamic table’s metadata

    • MONITOR privilege on the dynamic table
      Note: OPERATE privilege allows Monitoring metadata, but it also allows altering dynamic table.

Note:
Only a user with OWNERSHIP privilege on the dynamic table can drop that dynamic table.

Cost

Compute cost

There are two compute costs associated with dynamic tables: virtual warehouses and Cloud Services compute.

  • Virtual warehouses:
    Dynamic tables require virtual warehouses to refresh - that is, run queries against base objects when they are initialized and refreshed, including both scheduled and manual refreshes. These operations use compute resources, which consume credits.

  • Cloud Services compute:
    Dynamic tables also require Cloud Services compute to identify changes in underlying base objects and whether the virtual warehouse needs to be invoked. If no changes are identified, virtual warehouse credits aren’t consumed since there’s no new data to refresh.

Note:
There may be instances where changes in base objects are filtered out in the dynamic table query. Even though dynamic table is not updated since the changes haven’t passed the where/having clause, virtual warehouse credits are consumed because the dynamic table undergoes a refresh to determine whether the changes are applicable.

Storage Cost

Dynamic tables require storage to store the materialized results. Similar to regular tables, you may incur additional storage cost for Time Travel, fail-safe storage, and cloning feature.

Limitations

  • A dynamic table’s content is based on the query, and it can’t be modified using DML operations.
  • You can’t truncate data from a dynamic table.
  • Dynamic tables can become stale if they are not refreshed within the MAX_DATA_EXTENSION_TIME_IN_DAYS period of the input tables. Once stale, dynamic tables must be recreated resume refreshes.

Dynamic Table vs Streams and Tasks

It is possible to set up data transformation pipeline with Streams and Tasks, but setting up Dynamic Tables is much easier compared to using Streams and Tasks.

Dynamic Table Stream & Task
A declarative approach: You define end result you want and Snowflake takes care of its implementation An imperative approach: You define how to transform the data step-by-step.
Dynamic syncing: syncing takes place when the data changes on the base/source table You define a schedule for executing the code that transforms the data. (Scheduling the task)
Cannot use stored procedure and other tasks Can use stored procedure and other tasks

Comments

Popular posts from this blog

Introduction to SQLFluff: How to make your SQL code clean and error-free

Image by Jake Aldridge from Pixabay You know oftentimes, the cause of runtime or compile errors and hours of debugging agony is all due to simply a missing semicolon. Have you ever had such experience? If you had, you are not alone. There are two ways to avoid these unfortunate situations: either become a perfect developer who never makes mistakes, or use helpful tools such as linters that can catch these errors early on. I am nowhere near being a perfect developer who never makes a mistake. In fact, I'm probably the opposite of a perfect developer, so even if I wanted to, I wouldn’t be able to teach you how to become a perfect developer. But what I can teach you is using linters. A Wikipedia defines a linter as a "static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs." If you're not convinced yet on using linters, consider this scenario: in a large project with multiple members, different people tend to ...

Git squash merge explained

There are many ways to integrate changes in git: regular / normal git merge, git squash merge, git rebase etc. This article explains git squash merge by comparing it to regular merge. Let’s use below example: In the repository with default main branch, after two commits, a new feature branch is created. Some work happened in feature branch. feature branch now has 2 commits that it shares with main branch, and three exclusive commits (exists only in feature branch). In the meantime, others worked on main branch and added two new commits (exists only in main branch). git log output of the main branch: c72d4a9 ( HEAD - > main ) fourth commit on main 2c3dd61 third commit on main 0c2eec3 second commit on main 9b968e8 first commit on main git log output of the feature branch: 786650f ( HEAD - > feature ) third commit on feature 21cbaf1 second commit on feature 677bc7f first commit on feature 0c2eec3 second commit on main 9b968e8 first commit on mai...

How To Use KeePassXC Cli

There are similarly named programs: KeePass, KeePassX and KeePassXC (many of which are each others’ forks). Program Condition KeePass primarily for Windows. KeePassX no longer actively maintained. KeePassXC actively maintained and runs natively on Linux, macOS and Windows . Note: GUI version of the KeePassXC has more features than cli version. GUI version has variety of shortcuts as well. Regarding how to use GUI version of the KeePassXC, visit Getting Started Guide . Below features are available only in GUI version. Setting “Name” and “Description” fields of passwords database. Nesting Groups. Creating entry attributes ( open issue ). Adding Timed One-Time Passwords (TOTP). Adding entry with the same title as existing entry. KeePassXC stores all the passwords in passwords database. A passwords database (hereafter referred to as database) is an (encrypted) binary file. It can have any or no extension, but the .kdbx extension is commonly used. The ...