SQL Table Aliases are Identifiers

A surprising thing to me working day-to-day is how often I still see queries that treat SQL table aliases purely as typing-saving shortcuts.

I often see code (whether it’s in a programming language like Go or Javascript, or in a database view definition or stored procedure) using queries that treat table aliases purely as a mechanism to reduce typing. For instance:

    users u
    memberships m
    ON u.id = m.user_id
    groups g
    ON g.id = m.group_id
    u.created_at < now() - INTERVAL '300 days'
    AND g.features && array['billing_suspended] 

Leaving aside other problems in this query1, the use of short table aliases makes this query much harder to read, particularly as the query gets longer. The short aliases make the query easier to write, at the cost of making the query harder to read.

Optimizing for ease-of-writing over ease-of-reading is always, 100% the wrong tradeoff, and is a huge part of why codebases decay over time. Code is read much more frequently than it is written, so the writer must work harder to make the code easy to read.

In this case you may think the example query is so simple that the short aliases don’t hurt readability much. While this is to an extent true, the problem is that developers are often in a hurry, and when they want to make a small change (or what the developer hopes fervently is a small change), they prefer following the existing code pattern and prefer not to go back and clean up code.

How would this example query look like with meaningful table aliases? It might look like this:

    users billing_suspended_users
    billing_suspended_users.created_at < :now - INTERVAL '300 days'
                memberships.group_id = groups.id
                AND groups.features && array['billing_suspended']
            memberships.user_id = billing_suspended_users.id

Notice that I haven’t bothered giving memberships and groups table aliases: table aliases are valuable only when they help clarify how the table is being used, and not merely to shorten the name.

  1. Some other obvious problems in the original query are:

    • The use of now() hardcoded in the SQL. Current time references make it quite annoying to construct test data. Test data insertion for queries referencing the wallclock time directly is also forced to reference the wallclock time directly, and is prone to flake. The query should ask for a clock instead.

    • The joins on membership and groups are being used not because those tables have data that’s interesting to the consumer, but rather as filters on the set of users to return. For readability, it’s best to join tables only if they have data the consumer wants, and to prefer EXISTS clauses for filtering.