Best Practice: SQL Formatting
Data and analytics should be here to guide us in our daily decisions, in order to do that though, data needs to be clearly defined, trusted, and easily digestible for everyone. Here is a brief overview on our best practices for formatting SQL.
Structured Query Language (SQL) is a programming language that is used to manage relational databases and perform various operations on the data in them. It’s used to modify database tables and index structures, add update and delete rows of data, retrieve subsets of data and more.
The Importance Of Standardising
- Codes should always be peer reviewed before release. Clearly formatted scripts cut down time required to comprehend someone’s work.
- It helps spot and eliminate human errors, hence increasing accuracy and trust in your data solutions.
- It facilitates smooth and efficient handover. Project evolves, people move on. Well-documented scripts minimise disruptions.
- It spreads knowledge. Commenting on a data ‘trick’ costs little but could be greatly beneficial to others. Sometimes even pros can learn from other pros!
Our Best Practices
- Write all SQL keywords and functions in capital letters. It helps to identify immediately the main components of the query.
- Align top level clauses (SELECT, FROM, WHERE, HAVING, GROUP BY) with river blanks.
- Do not write more than one of the above clauses in a single line.
- Keep the tabulation consistent throughout the query.
- When selecting multiple columns, put each in a new line starting with spaces, comma, space then column name.
- Use ‘AS’ for column alias.
- Prefix column selection with table alias, e.g. t.user_id
Common Table Expression (CTE)
- Use CTEs instead of nested subquery.
- Indent CTEs based on river of blanks with the “WITH” statement.
- Specify the column names in the outermost query, rather than using “*”. SELECT * is acceptable for WHERE NOT EXISTS clauses or intermediate queries (eg: CTEs).
Structure and Comments
- Use comments to explain where code is complicated or not obvious or involves special scenarios or business logic.
- Avoid empty lines with only spaces.
- Use meaningful CTE names, e.g. registrations instead of table1.
- Use meaningful column alias, especially with aggregate functions.
- When possible, use common prefixes and suffixes. Examples:
On long lines (eg: CASE WHEN) use a logical point for wrapping the line manually. Some options:
a) Align the WHEN keywords
b) Split on operator
c) One line per parameter to a function
How 173tech Can Help
We are a data analytics consultancy who use SQL and python everyday. If you need an extra hand in optimising your data pipeline, why not get in touch and see if we can help?