Introduction to SQLFluff: How to make your SQL code clean and error-free

happy man Image by Jake Aldridge from Pixabay

You know oftentimes, the cause of runtime or compile errors and hours of debugging agony is all due to simply a missing semicolon. Have you ever had such experience? If you had, you are not alone. There are two ways to avoid these unfortunate situations: either become a perfect developer who never makes mistakes, or use helpful tools such as linters that can catch these errors early on.

I am nowhere near being a perfect developer who never makes a mistake. In fact, I'm probably the opposite of a perfect developer, so even if I wanted to, I wouldn’t be able to teach you how to become a perfect developer. But what I can teach you is using linters. A Wikipedia defines a linter as a "static code analysis tool used to flag programming errors, bugs, stylistic errors and suspicious constructs."

If you're not convinced yet on using linters, consider this scenario: in a large project with multiple members, different people tend to write code in different formats. Differing formats often make code reviews difficult and cause inconsistency in source code. It would be easier and more consistent for everyone if all team members wrote the code in the same format. A linter can help you with this issue too. Hopefully, now you see the value of using linters.

There are linters for almost every programming language. In many cases there is more than one good linters for a particular language. However, when it comes to SQL, as far as I know, there are fewer options. But it can't be a reason to feel discouraged though, SQLFull, the most popular linter for SQL, is an excellent tool. In this article I will explain what it is and how to use it.

SQLFLuff

SQLFluff is the most popular linter for SQL language. It touts itself as being a "SQL linter for humans". It helps not only with catching syntax errors but also fixing them. It supports multiple dialects of SQL language such as PostgreSQL, MySQL and even Snowflake.
You can find the official documentation here: Docs.

Installation

SQLFluff can be installed as

a VSCode extension
pre-commit hook
command line tool
CI/CD pipeline tool

Easier and less error-prone way of running SQLFluff is as a command line tool. We will also discuss using it as a pre-commit hook at a later section.

Installing SQLFluff as a command line tool

Note: to install SQLFluff, you need Python and pip (or any other package manager such as poetry, pipenv). This example uses pip.

Install SQLFluff by running the following command on the terminal:

pip install sqlfluff

Check the installation by running: (expected output is something like this: 3.2.4)

sqlfluff version
# 3.2.4

Usage

SQLFluff has 3 main commands: lint : this command shows the syntax errors format : formats the code (mainly it fixes new line, indentation and white spaces) fix: fixes the syntax errors such as missing comma, semicolon, new line etc.

To lint a file named test.sql, run:

sqlfluff lint test.sql --dialect snowflake

The results looks like below:

== [test.sql] FAIL
L:   1 | P:   1 | LT01 | Expected only single space before 'SELECT' keyword.
                       | Found '  '. [layout.spacing]
L:   1 | P:   1 | LT02 | First line should not be indented.
                       | [layout.indent]
L:   1 | P:   1 | LT13 | Files must not begin with newlines or whitespace.
                       | [layout.start_of_file]
L:   1 | P:  11 | LT01 | Expected only single space before binary operator '+'.
                       | Found '  '. [layout.spacing]
L:   1 | P:  14 | LT01 | Expected only single space before naked identifier.
                       | Found '  '. [layout.spacing]
L:   1 | P:  27 | LT01 | Unnecessary trailing whitespace at end of file.
                       | [layout.spacing]
L:   1 | P:  27 | LT12 | Files must end with a single trailing newline.
                       | [layout.end_of_file]
All Finished 📜 🎉!

Output shows test.sql file has only formatting error.

Understanding the linting output:

Linting output shows the L (line number) and P (position) that caused the error, and the rule index that threw the error, in this case LT01. After the rule index there is a short explanation as for what is the problem. This LT01 rule concerns with layout spacing and checks for inappropriate spacing such as excessive whitespace, trailing whitespace at the end of a line and also the wrong spacing between elements on the line.
You can find all the rules and their definitions here: Rule Index.

SQLFluff comes with default rules. You can quickly check those rules in command line by running:

sqlfluff rules

It is possible to specify the rules we want to check against or exclude certain rules.

Specifying the rules:

sqlfluff lint test.sql --rules LT02,LT12,CP01,ST06,LT09,LT01

Excluding rules:

sqlfluff fix test.sql --exclude-rules LT01

SQLFluff may not recognize some keywords certain SQl dialects uses. When it happens it throws parsing error. Workaround of this problem is to tell SQLFluff to ignore parsing errors.

sqlfluff fix test.sql --ignore parsing

Other command line configurations can be find here: CLI Config.

Config File

Configuring SQLFluff using cli may be little tedious, instead we can use config files. SQLFluff comes with default configurations. However, based on your preference, you can overwrite default configurations.

SQLFluff supports following config file types:

setup.cfg
tox.ini
pep8.ini
.sqlfluff
pyproject.toml

In below example we change 7 configuration rules. Note: The rest of the rules stay the same as default.

.sqlfluff file

[sqlfluff]

dialect = snowflake

exclude_rules = ST06

# Number of passes to run before admitting defeat
runaway_limit = 2

large_file_skip_byte_limit = 2000000

[sqlfluff:layout:type:comma]
line_position = leading

The same configuration in TOML file format:

[tool.sqlfluff]

dialect = snowflake

exclude_rules = ST06

runaway_limit = 2

large_file_skip_byte_limit = 2000000

[tool.sqlfluff.layout.type.comma]
line_position = leading

Explanation of the above configuration:

dialect = snowflake sets the dialect of the SQLFluff to Snowflake syntax. exclude_rules = ST06 excludes the rule ST06. Rule ST06 orders select targets in ascending complexity. For instance, it puts * above other select targets like below:

select
    *,
    a,
    b,
    row_number() over (partition by id order by date) as y
from my_table;

However, I don't want to change the order of the columns, so I'm disabling the ST06 rule.

runaway_limit = 2 specifies the number of times the SQLFluff tries to find errors/improvements in a given block of code before moving on. large_file_skip_byte_limit = 2000000 instructs SQLFluff to ignore the file if the sql file size is larger than 2000000 bytes (2MB). line_position = leading it puts the commas at the beginning of each line. Note, very large files can make the SQLFluff parser effectively hang.

Result of line_position = leading rule:

select
    a
    ,b
    ,c
from my_table;

Pre-Commit

Git is an indispensable tool for every developer. It helps with version control and collaboration. Git can also call some functions while carrying out its usual job. For example, before committing changes to a repository, git can call certain functions, these functions are called pre-commit hooks. A tool called pre-commit can help with setting up and configuring these functions (pre-commit hooks). We can set-up pre-commit hooks to run SQLFluff automatically whenever we try to commit our changes. If SQLFluff finds error in our sql files, it throws error and Git prevents the changes from being committed to the repository.

Setting up pre-commit

pip install pre-commit

# check installation
pre-commit --version
# pre-commit v4.0.1

create a file named .pre-commit-config.yaml. in your repository's root directory. Inside the file write below:

repos:
  - repo: https://github.com/sqlfluff/sqlfluff
    rev: stable_version
    hooks:
      - id: sqlfluff-lint
      - id: sqlfluff-fix

pre-commit uses SQLFluff configurations specified in .sqlfluff file. But If you don't want to use .sqlfluff file for whatever reason, you can also write the configurations directly on .pre-commit-config.yaml file as below.

repos:
  - repo: https://github.com/sqlfluff/sqlfluff
    rev: stable_version
    hooks:
      - id: sqlfluff-lint
        args:
            - '--dialect'
            - 'snowflake'
            - '--exclude-rules'
            - 'ST06'
            - '--runaway-limit'
            - '2'
            - '--layout-type'
            - 'comma'
            - '--line-position'
            - 'leading'

      - id: sqlfluff-fix
            args:
            - '--dialect'
            - 'snowflake'
            - '--exclude-rules'
            - 'ST06'
            - '--runaway-limit'
            - '2'
            - '--layout-type'
            - 'comma'
            - '--line-position'
            - 'leading'

Run pre-commit install to set up the git hook scripts. Now, whenever you run git commit, SQLFluff linting and fixing commands will run before the commit. If SQLFluff finds no errors, commit is made, but if it finds errors commit is not made. SQlFluff outputs the errors to the terminal and fixes them as much as it can. After that you need to run git add and git commit commands once again.
You can find more info on using SQLFluff with pre-commit here: Pre-commit docs.

Cherry Peeked

Search This Blog