From Generic to Genius: Fine-tuning LLMs for Superior Accuracy in Snowflake

Machine Learning

TL;DR: Cortex Fine-tuning is a fully managed service that lets you fine-tune popular LLMs using your data, all within Snowflake.

While large language models (LLMs) are revolutionizing various fields, their "out-of-the-box" capabilities might not always align perfectly with your specific needs. This is where the power of fine-tuning comes into play. As it will be explained in this article, this feature empowers you to take a base LLM and customize it to excel in your particular domain. Here's the brief summary of why you might want to leverage Snowflake's fine-tuning capabilities:

Unlocking Domain Expertise: Pre-trained LLMs are trained on massive, general datasets. Fine-tuning allows you to build upon this foundation and train the LLM further using data specific to your field, such as legal documents, medical records, or financial data. This empowers the LLM to understand complex terminology and patterns unique to your domain, leading to more accurate and relevant results.
Better than Prompt Engineering: While prompt engineering is a powerful technique for guiding LLM responses, it relies on the LLM having access to the necessary information within its pre-trained knowledge. If the LLM lacks the relevant domain-specific knowledge, even the most carefully crafted prompts may not yield accurate or satisfactory results.
Cost-Effectiveness: Developing an LLM from the ground up is a resource-intensive endeavor, demanding significant expertise, time, and computational power. Fine-tuning offers a more practical and cost-effective solution. It allows you to capitalize on the capabilities of existing pre-trained models, customizing them to align with your specific requirements.

This article provides a comprehensive overview of fine-tuning LLMs in Snowflake, from understanding the basics of LLMs and fine-tuning to the practical steps involved in using this powerful feature.

Fine-tuning LLMs in Snowflake

A large language model (LLM) is an AI Model that understands natural language and generates a text as a response.

Fine-tuning means modifying/customizing something to better suit your needs. Snowflake has a function called Cortex Fine-tuning that allows you to further tain large language models (called base models) to better suit a specific task.

To super oversimplify what LLMs are and how they work, LLM is a collection of algorithms that while training learn correlation between words. LLMs are trained on lots of data. After training, LLMs respond to your questions based on what they have seen in their training data and correlation between words. LLMs compose their answer one word at a time, every time they try to add a new word to their answer, they calculate the probability of what word might come after the current sequence of words in the answer. For example, let's say the LLM is forming an answer to your question, "What is something that nowadays everyone owns". It has formed a beginning of a sentence "Nowadays everyone seems to own ..", to finish the sentence it checks the common words that it had usually seen with the words "nowadays", "everyone", "owns" in its training data. based on its training data, the common words that might come after "nowadays", "everyone", "owns" are "a smartphone" with probability of 80%, "a car" with the probability of 60%, "a house" with the probability of 5%. Since the smartphone has the highest probability, LLM chooses Smartphone ans answers "Nowadays everyone seems to own a smartphone".

Imagine this LLM was trained on only data that was available up until 1900s, meaning it has no idea about smartphone or even cars. In that case, this the same LLM would answer to the same question of "What is something that nowadays everyone owns?" as "Nowadays everyone seems to own a wood stove". Because on its training data it had seen "wood stove" being used with other words "everyone", "owns" etc. so it associated the "wood stove" with above words. Or if the LLM had a limited data during its training about (Microsoft) Windows and Apple (a company), it might associate these words with a physical structure and a fruit rather than an operating system and a tech company.

Snowflake docs state:

If you don’t want the high cost of training a large model from scratch but need better latency and results than you’re getting from prompt engineering or even retrieval augmented generation (RAG) methods, fine-tuning an existing large model is an option.

What it means is that

developing a large language model is very expensive and takes a lot of time & expertise
the LLMs available for use (open-source, commercial etc.) may not suit your needs completely, i.e LLMs may not be trained enough on a certain domain specific fields. Most likely these LLMs are not trained on your private data (that belong to you company), so while answering your questions, they answer based on public data not your internal data.

As a middle ground, you can use these LLMs as a base model and further train/customize them to fit your needs better.

Fine-tuning allows you to use examples to adjust the behavior of the model and improve the model’s knowledge of domain-specific tasks.

Snowflake doesn't reveal much of inner workings of fine-tuning process.

How you fine-tune a model in Snowflake is by providing example prompts and answers.

Note: If Snowflake decided to remove a base model from its platform for whatever reason, your fine-tuned model will no longer work.

Models available to fine-tune

mistral-7b by Mistral AI
mixtral-8x7b by Mistral AI
llama3-8b by Meta
llama3-70b by Meta

b is the the number of parameters used in training these models. The number of parameters typically correlates with the model's capacity to learn complex patterns. A model with more parameters can often capture more intricate relationships in data, leading to better performance on various tasks. But higher number of parameters also influence resource utilization.

The main takeaway here is a model with higher number of parameters usually performs better and requires more computational power to run.

Fine Tuning the base model

You can fine tune a base model in snowflake by choosing a base model and feeding it prompt and response pairs.

You should have a table or view that must have one column consisting of prompts and another column consisting of answers to those prompts. Ideally the column names should be "prompt" and "completion", but if it is not the case, you can use aliases.

Note: Table or view can have more than two columns, but to fine-tune a model only two columns will be used.

Snowflake provides SNOWFLAKE.CORTEX.FINETUNE function for fine-tuning models.

Fine-tuning a model is a time-consuming process. You start/create the process and snowflake runs it until its completion. After creating the fine-tuning process you can close your browser/tab etc.

Example:

SELECT SNOWFLAKE.CORTEX.FINETUNE(
  'CREATE',
  'my_super_llama',
  'llama3-8b',
  'SELECT question AS prompt, answer AS completion FROM training_dataset',
  'SELECT prompt, response AS completion FROM validation_dataset'
);

Above command returns the ID of the fine-tuning process.

You can get the list of the fine-tuning processes via SNOWFLAKE.CORTEX.FINETUNE('SHOW') command. If you want to check status of a specific fine-tuning process, you can use SNOWFLAKE.CORTEX.FINETUNE('DESCRIBE', '<finetune_job_id>') command.

If you change your mind, and want to abort fine-tuning process you can use SNOWFLAKE.CORTEX.FINETUNE('CANCEL', '<finetune_job_id>') command to cancel the process.

Keep in mind there is context window limitation. Context window is a number of tokens an LLM can process at a time. In simple terms, it is the length of the text an LLM can process.

Below is the breakdown of the context window for each base model:

Model	Context Window	Input Context (prompt)	Output Context (completion)
mistral-7b	32k	28k	4k
llama3-8b	8k	6k	2k
mixtral-8x7b	32k	28k	4k
llama3-70b	8k	6k	2k

Using Fine-tuned model

After successful fine-tuning of the base model, you can use it similar to any other LLM on snowflake via SNOWFLAKE.CORTEX.COMPLETE function.

Syntax:

SNOWFLAKE.CORTEX.COMPLETE(
    <model_name>, <prompt_or_history> [ , <options> ] );

Example:

SELECT SNOWFLAKE.CORTEX.COMPLETE('my_super_llama',
    'What is the core principle of our ABC company?');

Necessary Privileges

USAGE on the database that the training (and validation) data are queried from.
OWNERSHIP or (CREATE MODEL and USAGE) on the schema that the model is saved to.
SNOWFLAKE.CORTEX_USER on the database that the model is saved to.

Cost

The Snowflake Cortex Fine-tuning function incurs compute cost based on the number of tokens used in training.

A token is the smallest unit of text processed by the Snowflake Cortex Fine-tuning function, approximately equal to four characters of text.

There are also storage and warehouse (compute) costs for storing data, and for running any SQL commands.

Fine-tuning vs RAG

The purpose of Fine-tuning is to improve LLM's response accuracy. There is another method too to achieve the same goal called Retrieval Augmented Generation (RAG). RAG is an LLM optimization method introduced by Meta in 2020.

So what is the difference between the two in simple words? Fine-tuning is retraining the base model on a specific domain. We can think of it making a base model smarter with regard to specific domain such as medicine, or philosophy by enabling it to discover patterns/associations in a more specific field.

RAG is providing relevant information to a LLM with prompting it. When user submits a prompt to a LLM that has RAG, the process unfolds in the following way:

Retrieval of relevant information from database or internet based on the set-up.
Retrieved information is added (Augmented) to the user's query and submitted to the LLM as a prompt.
LLM Generates response using user's query and Retrieved information.

We can think of it as giving a medical book to a fairly smart person and asking them to answer our questions using the book & their general knowledge.

RAG optimization requires complex set-up and it is more resource intensive to run since this set-up requires querying the database every time. Also since the retrieved information is added to user's query as a prompt to LLM, context-window limitation might be an issue.

As for fine-tuning, it is similar to training the person in a specified field. Using above example, it would be making the fairly smart person go to medical school to study and after they graduate asking them to answer our questions using their learnt knowledge.

Fine-tuning a base model requires lots of compute resources but once training is finished, running fine-tuned LLM is less resource intensive than running RAG LLM.

Generally speaking, RAG LLM produces more accurate answers. Another advantage is that if the RAG LLM is hooked up to internet or frequently updated Database, this set-up produces more up-to-date answers, while fine-tuned LLM answers solely based on the data it has seen during its training.

The best thing is these two optimization methods can be used together, making the base model smarter and providing it with (up-to-date) relevant information.

You can learn more about differences between fine-tuning and RAG in below articles:

RAG vs. fine-tuning by Ivan Belcic and Cole Stryker, IBM
RAG vs. fine-tuning: Choosing the right method for your LLM by Superannotate.com

If you want to know how to set-up and use RAG LLM on Snowflake, you can find more information on the this page.

脱初心者！ Git ワークフローを理解して開発効率アップ

Git – チーム開発に必須のバージョン管理システムですが、その真価を発揮するにはワークフローの理解が欠かせません。色々な人は Git の使い方を良く知っていますが、Git を仕事やワークフローに統合する方法を余り良く知らない人もいます。本記事では、Git をワークフローに組み込むことで、開発プロセスがどのように効率化され、チーム全体のパフォーマンスが向上するのかを解説します。Centralized Workflow から Forking Workflow まで、代表的な 9 つのワークフローの特徴を分かりやすく紹介します。それぞれのメリット・デメリット、そして最適なユースケースを理解することで、あなたのプロジェクトに最適なワークフローを選択し、開発をスムーズに進めましょう！ Centralized Workflow Feature branching/GitHub Flow Trunk Based Flow Git Feature Flow Git Flow Enhanced Git Flow One Flow GitLab Flow Forking Workflow 分かりやすくするために、同じコンセプトを説明するに一つ以上の図を使った場合があります。 Centralized Workflow 説明：集中化ワークフローではプロジェクトにおけるすべての変更の単一の入力箇所として中央リポジトリを使用します。デフォルトの開発用ブランチは main と呼ばれ、すべての変更がこのブランチにコミットされます。集中化ワークフローでは main 以外のブランチは不要です。チームメンバー全員がひとつのブランチで作業し、変更を直接中央リポジトリにプッシュします。メリット： SVN のような集中型バージョン管理システムから移行する小規模チームに最適。デメリット：お互いのコードが邪魔になり (お互いの変更を上書きするように)、プロダクション環境にバグをい入れる可能性が高くて、複数のメンバいるチームでこのフローを使いにくい。地図： graph TD; A[Central Repository] -->|Clone| B1[Developer A's Local Repo] A --...

Cherry Peeked

Search This Blog