Skip to main content

Snowflake Load History vs Copy History: 7 differences

data Image by Icons8_team from Pixabay

Tracking data loads in Snowflake is crucial to maintaining data health and performance. Load History and Copy History are features that provide valuable information about past data loads. Understanding these features can help you efficiently troubleshoot, audit, and analyze performance.

You might be wondering why two functions exist to achieve the same goal, what are the differences, which one I am supposed to use and when? In this article we will provide you with all the answers. So, let's learn what are the differences and when to use which!

Load History vs Copy History: 7 differences

Differences 1 and 2: views vs table function and Account Usage vs information Schema

Here things get little confusing, bare with me, there are two Load History views, a view that belongs to Information Schema and a view that belongs to Account Usage schema. As for Copy History, there are Copy History table function of Information schema and a Copy History view of Account Usage schema.

Information Schema Account Usage
Load History View Load History View
Copy History Table Function Copy History View

Load History of Information Schema example:

use database db_1;

select table_name, last_load_time
  from information_schema.load_history
  where schema_name = current_schema() and
  table_name='TABLE_1';

Load History of Account Usage example:

use database db_1;

select table_name, last_load_time
  from snowflake.account_usage.load_history
  where schema_name = current_schema() and
  table_name='TABLE_1';

Copy History view example:

select table_name, last_load_time
from snowflake.account_usage.copy_history
  where schema_name = current_schema() and
  table_name='TABLE_1';
;

Pay attention the way you query Copy History view and Load History views is almost identical.

Copy History table function example:

select *
from table(
    information_schema.copy_history(
        TABLE_NAME=>'TABLE_1',
        START_TIME=> DATEADD(hours, -1, CURRENT_TIMESTAMP())
        )
    )
;

Difference 3: query syntax and the number of tables that can be specified

If you haven't already noticed, let me clarify that using both Load History and Copy History views you can query load history of more than one table. But with Copy history table function, you are limited to querying load history of a single table at a time. Table name in Copy History table function is required.

Example:

Below query only returns load history of Table_1.

select *
from table(
    information_schema.copy_history(
        TABLE_NAME=>'TABLE_1',
        START_TIME=> DATEADD(hours, -1, CURRENT_TIMESTAMP())
        )
    )
;

Below query returns load history of (up to) 10 tables.

USE DATABASE database_a;

SELECT table_name, last_load_time
  FROM information_schema.load_history
  ORDER BY last_load_time DESC
  LIMIT 10;

Difference 4: Latency

Copy History and Load History views of Account Usage have latency between the latest changes and when those changes are reflected in these views. To be more precises, Copy History view of Account Usage has up to 120 minute latency and Load History view of Account Usage has up to 90 minute latency in most of the cases. But latency might be up to two days if both of the following conditions are met:

  • Fewer than 32 DML statements have been added to the given table since it was last updated in COPY_HISTORY/LOAD_HISTORY.

  • Fewer than 100 rows have been added to the given table since it was last updated in COPY_HISTORY/LOAD_HISTORY.

As for Load History view of Information Schema and Copy History table function there is no latency.

Source Latency of Data
Copy History view usually 120 minutes
(up to 2 days)
Load History view of Account Usage usually 90 minutes
(up to 2 days)
Load History view of Information Schema No latency
Copy History table function No latency

If you need latest data with no latency, you had better use either Load History view of Information Schema or Copy History table function.

Difference 5: Retention Period

Load History view of Information Schema and Copy History table function retains historic data for 14 days. Copy History view and Load History view of Account Usage retain historic data for 1 year (365 days).

Source Data Retention
Copy History view 365 days
Load History view of Account Usage 365 days
Load History view of Information Schema 14 days
Copy History table function 14 days

Difference 6: Tracking the history of data loaded using Snowpipe

Load History views (of both Information Schema and Account Usage) do NOT return the history of data loaded using Snowpipe, it only returns the history of data lated using COPY INTO command. While both Copy History table function and view return the history of data loaded using Snowpipe and Copy INTO command.

Snowpipe is used in continuous loading.

Snowpipe enables loading data from files as soon as they’re available in a stage.

Source Load history type
Copy History view COPY INTO
and Snowpipe
Load History view of Account Usage COPY INTO only
Load History view of Information Schema COPY INTO only
Copy History table function COPY INTO
and Snowpipe

Difference 7: the maximum number of rows returned

Load History of Information Schema returns an upper limit of 10,000 rows. Copy History table function, Copy History view and Load History of Account Usage do not have this limit.


Summary

As you can see all have distinct features. To quickly grasp the differences among them, you can refer to the below table that summaries all the features of the Load History, Copy History views and Copy History table function.

Differences Load History Copy History Load History Copy History
View or Table function view view view table function
Schema they belong to Account Usage Account Usage Information Schema Information Schema
Multiple tables
can be queried
Yes Yes Yes No
Data Retention 365 days 365 days 14 days 14 days
Latency usually 90 minutes
(up to 2 days)
usually 120 minutes
(up to 2 days)
No latency No latency
Load history type COPY INTO only COPY INTO
and Snowpipe
COPY INTO only COPY INTO
and Snowpipe
Maximum number of rows returned No limit No limit 10,000 No limit
  • I would recommend using Copy History view if you need load history data that is older than 2 days.
  • Use Copy history table function if you need load history data of a single table with no latency.
  • Use Load History view of Information schema if you need to query load history data of multiple tables with no latency.

Comments

Popular posts from this blog

脱初心者! Git ワークフローを理解して開発効率アップ

Git – チーム開発に必須のバージョン管理システムですが、その真価を発揮するにはワークフローの理解が欠かせません。 色々な人は Git の使い方を良く知っていますが、Git を仕事やワークフローに統合する方法を余り良く知らない人もいます。本記事では、Git をワークフローに組み込むことで、開発プロセスがどのように効率化され、チーム全体のパフォーマンスが向上するのかを解説します。Centralized Workflow から Forking Workflow まで、代表的な 9 つのワークフローの特徴を分かりやすく紹介します。それぞれのメリット・デメリット、そして最適なユースケースを理解することで、あなたのプロジェクトに最適なワークフローを選択し、開発をスムーズに進めましょう! Centralized Workflow Feature branching/GitHub Flow Trunk Based Flow Git Feature Flow Git Flow Enhanced Git Flow One Flow GitLab Flow Forking Workflow 分かりやすくするために、同じコンセプトを説明するに一つ以上の図を使った場合があります。 Centralized Workflow 説明: 集中化ワークフローではプロジェクトにおけるすべての変更の単一の入力箇所として中央リポジトリを使用します。デフォルトの開発用ブランチは main と呼ばれ、すべての変更がこのブランチにコミットされます。 集中化ワークフローでは main 以外のブランチは不要です。チームメンバー全員がひとつのブランチで作業し、変更を直接中央リポジトリにプッシュします。 メリット: SVN のような集中型バージョン管理システムから移行する小規模チームに最適。 デメリット: お互いのコードが邪魔になり (お互いの変更を上書きするように)、プロダクション環境にバグをい入れる可能性が高くて、複数のメンバいるチームでこのフローを使いにくい。 地図: graph TD; A[Central Repository] -->|Clone| B1[Developer A's Local Repo] A --...

Introduction to SQLFluff: How to make your SQL code clean and error-free

Image by Jake Aldridge from Pixabay You know oftentimes, the cause of runtime or compile errors and hours of debugging agony is all due to simply a missing semicolon. Have you ever had such experience? If you had, you are not alone. There are two ways to avoid these unfortunate situations: either become a perfect developer who never makes mistakes, or use helpful tools such as linters that can catch these errors early on. I am nowhere near being a perfect developer who never makes a mistake. In fact, I'm probably the opposite of a perfect developer, so even if I wanted to, I wouldn’t be able to teach you how to become a perfect developer. But what I can teach you is using linters. A Wikipedia defines a linter as a "static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs." If you're not convinced yet on using linters, consider this scenario: in a large project with multiple members, different people tend to ...

WinMerge のセットアップと使う方

WinMerge は、Windows 用のオープン ソースの差分およびマージ ツールです。WinMerge は、フォルダーとファイルの両方を比較し、違いを理解して扱いやすい視覚的なテキスト形式で表示します。この記事でWinMerge のセットアップと使う方を教えます。 source: https://winmerge.org WinMerge をダウンロード WinMerge のウェブサイト に行って、「WinMerge-2.16.44-x64-Setup.exe」ボタンを押し、WinMerge 2.16 をダウンロードしてください。 WinMerge をインストール ダウンロードされたソフトウェアをクリックし、ポップアップ画面で「Next」を押してください 「Languages」部分をスクロールダウンし、「Japanese menus and dialogs」を選択し、「Next」ボタンを押してください ターミナル等からも WinMerge をアクセス出来ようにする為に「Add WinMerge folder to your system path」オプションを選択し、希望によって他のオプション選択してください 「Enable Explorer context menu Integration」オプションを選択したら、フォルダ/ファイルを右キリックし、コンテクストメニューから WinMerge を開くようになります。 「Install」ボタンを押し、「Next」ボタンを押し、その後、「Finish」ボタンを押してください 言語を日本語にする もし WinMerge の言語が日本語じゃなくて、英語なら、「Edit」タブから「Options」を押してください。 ポップアップ画面で右側の下にある「Languages」と言うドロップダウンメニューから日本語を選択し、「OK」ボタンを押してください WinMerge を使う方 「ファイル」タッブから「開く」を押し 参照ボタンを押し、比較したいフォルダ・ファイルを指定 比較したいフォルダを指定する方法: ポップアップ画面から対象のフォルダーを選択し、「Open」を押してくだい 何も選択しないで、「Open」を押してくだい 右側下にある「比較」ボタンを押し ...