Skip to main content

dbt Projects on Snowflake

dbt Projects on Snowflake

Now, it is possible to store dbt projects in Snowflake. In this blogpost, we will explore this new feature.

A dbt project is a set of related files that contains dbt_project.yml, profiles.yml and asset files like models.
Snowflake allows storing dbt project as a schema object. The files that belong to one dbt project is grouped together and treated as one unit (schema level object). This feature is called “dbt Projects on Snowflake”.

How to Create dbt projects on Snowflake

Step 1: Upload Source Files

The dbt project source files can be in any one of the following locations:

  • An internal named stage
  • A Git repository stage
  • An existing dbt project object
  • A workspace for dbt on Snowflake

Extra information regarding some of the above locations

A Git repository stage:
Snowflake supports cloning remote Git repositories. The Git repository clone in Snowflake acts as a local Git repository with a full clone of the remote repository, including branches, tags, and commits. One benefit of having Git repository clone in Snowflake is you can easily import repository files into Snowflake functions and stored procedures. See more details on this Snowflake page.

A workspace on Snowflake:
There are two types of private workspaces on Snowflake: local and remote Git repository-integrated.

  • In local workspace, all the files are local to Snowflake. Files are stored only in Snowflake.
  • In remote Git repository integrated workspace, workspace essentially becomes an editor with a clone of remote repository. Files are stored both in remote Git repository and Snowflake. (The difference between Git integrated workspace and Git repository stage is that former has editor.) Git integrated workspace has UI comments that enable git functionality like fetching, pushing etc.

Connecting to Remote Git Repositories

To connect to Remote Git Repositories below Snowflake objects/settings are necessary:

  • An API integration
  • If a repository is private, a secret
  • A network rule
  • An external access integration that references the network rule

Step 2: Deploy (Create) dbt Project

After copying the source files to Snowflake, they can be deployed as a dbt project in a schema. Here the word “deploying” means storing/saving the source files in Snowflake schema as a single unit. It does not mean running the dbt project.

Example deployment of dbt project when the source files are in Snowflake Workspaces.

Note: Snowflake workspaces have UI components that allow deployment.

CREATE DBT PROJECT some_db.xyz_schema.my_dbt_project
  FROM 'snow://workspace/user$.public."My DBT Project"/versions/live/'

Each time a dbt project is deployed, Snowflake creates a version specification starting at Version$1. Each time dbt project is altered, version is updated by one.

Executing/Running dbt Project

Once dbt Project is deployed/saved in schema, it can be executed. It is possible to execute using sql statements.

EXECUTE DBT PROJECT my_database.my_schema.my_dbt_project args='run --target prod';

It is also possible to wrap above command with Snowflake Procedures and Tasks. Tasks can be scheduled to create CI/CD pipeline.

No-Auto Updates

When you edit source code either on the workspaces or git repository, the changes are not reflected on the dbt Project. Remember dbt Project is a copy of the source files attached to specific schema. To make your changes apply to the dbt project, you must deploy it again (which create a new version of the dbt project).

Pros and Cons of dbt Project on Snowflake

It is difficult to find clear benefit of dbt Project on Snowflake. Only one potential benefit might exist:

Security: keeping the dbt project source code within snowflake where actual data is stored can be argued to be more secure. In this case, source files should be edited in a local workspaces. If collaboration is needed, shared workspaces should be used.

However, Git hosting services like GitHub, AWS Code Commit and others provide equally high security for storing the source code. Then the argument might switch to “managing two platforms instead of one introduces more friction and potential risk due to user/configuration error”. In reality, however, most companies already use multiple platforms in their tech stack. Besides using Snowflake local workspaces also means not being able to use editor extensions or command line tools like sqlfluff.

While scheduling and combining the dbt project execution with other actions on Snowflake using tasks might be considered an advantage, but it is not necessarily unique. Most Git hosting platforms have CI/CD features that support scheduled execution. So, this cannot be counted as a clear advantage. As for the combining the execution of the other actions such as stored procedures with dbt project execution, it can also be achieved using dbt on-run-start and on-run-end hooks.
Besides, in dbt project on Snowflake, (does not matter whether using local workspaces, Git integrated workspaces or Git repository stage), when a source files are altered, dbt Project needs to be (re)deployed. It is a friction.

Conclusion

In rare cases, having a requirement of keeping the source code where the data resides, it might be clear or only option. Other than that, dbt Project on Snowflake does not provide clear advantage over existing solutions.

Comments

Popular posts from this blog

Introduction to SQLFluff: How to make your SQL code clean and error-free

Image by Jake Aldridge from Pixabay You know oftentimes, the cause of runtime or compile errors and hours of debugging agony is all due to simply a missing semicolon. Have you ever had such experience? If you had, you are not alone. There are two ways to avoid these unfortunate situations: either become a perfect developer who never makes mistakes, or use helpful tools such as linters that can catch these errors early on. I am nowhere near being a perfect developer who never makes a mistake. In fact, I'm probably the opposite of a perfect developer, so even if I wanted to, I wouldn’t be able to teach you how to become a perfect developer. But what I can teach you is using linters. A Wikipedia defines a linter as a "static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs." If you're not convinced yet on using linters, consider this scenario: in a large project with multiple members, different people tend to ...

Git squash merge explained

There are many ways to integrate changes in git: regular / normal git merge, git squash merge, git rebase etc. This article explains git squash merge by comparing it to regular merge. Let’s use below example: In the repository with default main branch, after two commits, a new feature branch is created. Some work happened in feature branch. feature branch now has 2 commits that it shares with main branch, and three exclusive commits (exists only in feature branch). In the meantime, others worked on main branch and added two new commits (exists only in main branch). git log output of the main branch: c72d4a9 ( HEAD - > main ) fourth commit on main 2c3dd61 third commit on main 0c2eec3 second commit on main 9b968e8 first commit on main git log output of the feature branch: 786650f ( HEAD - > feature ) third commit on feature 21cbaf1 second commit on feature 677bc7f first commit on feature 0c2eec3 second commit on main 9b968e8 first commit on mai...

How To Use KeePassXC Cli

There are similarly named programs: KeePass, KeePassX and KeePassXC (many of which are each others’ forks). Program Condition KeePass primarily for Windows. KeePassX no longer actively maintained. KeePassXC actively maintained and runs natively on Linux, macOS and Windows . Note: GUI version of the KeePassXC has more features than cli version. GUI version has variety of shortcuts as well. Regarding how to use GUI version of the KeePassXC, visit Getting Started Guide . Below features are available only in GUI version. Setting “Name” and “Description” fields of passwords database. Nesting Groups. Creating entry attributes ( open issue ). Adding Timed One-Time Passwords (TOTP). Adding entry with the same title as existing entry. KeePassXC stores all the passwords in passwords database. A passwords database (hereafter referred to as database) is an (encrypted) binary file. It can have any or no extension, but the .kdbx extension is commonly used. The ...