Django select_related, to use or not to use; that is the question!

Django select_related, to use or not to use; that is the question!

Sooner or later, every developer will face the need to optimize the application. It is not the simplest one and can often be carried out in several different ways. Today, we will look at a function that will often speed uploading a page written with Django - select_related.

How does select_related work?


The principle of operation of the select_related function is relatively simple - it allows you to reduce the number of queries to the database by combining many individual queries into one complex. The page loading time is shortened. I will discuss this issue in a practical example.

Example

We are setting up an online store offering house designs. Each house has many fields referring to other objects, such as the garage, roof, basement, author, type of heating, etc., which in this case will be ForeignKey fields). The user of our website gives one of the parameters of the dream house and clicks "search," and the query is sent to the database:

houses = House.objects.filter(param=value)

then we need the information to complete the table with the basic parameters of the houses found, e.g.:

roof = house.roof.angle (kąt nachylenia dachu) 
heating_type = house.heating_type.name (rodzaj ogrzewania) 
garage = house.garage.area() (powierzchnia garażu)

If we were lucky and one object appears in the search results, the page will refresh in the blink of an eye. Remember, however, that under cover of the above Python functions, there are 4 database queries.

SELECT ••• FROM app_name_house WHERE "house"."param" = 'value'

At this point, we have a house list containing one house object, however its fields: roof, heating_type, and the garage only contain the id information of these objects in the corresponding table; let's assume they are X, Y, Z, respectively. Three queries are sent from the houses in the houses list:

roof:SELECT ••• FROM app_name_roof WHERE "roof"."id" = X
heating_type:SELECT ••• FROM app_name_heating WHERE "heating_type"."id" = Y
garage: SELECT ••• FROM app_name_garage WHERE "model_name"."id" = Z

It does not look very dangerous; after all, what are four queries for today's data carriers? However, we need to look at this issue from a broader perspective. We cannot assume that the result will contain one object; we will operate on the list of objects much more often, for example, one hundred, one thousand, 10,000 or more, and this is where - remaining in the construction topics - “stairs” appear. Assuming that the list of results includes 1000 houses, and we need three parameters for each of them, the number of queries will increase to 3001, which will noticeably slow down the page. The exact value depends, among other things, on the "size" of the objects. It may be surprising that in the discussed example, the type of hard disk used in the computer does not have a major impact on the course of the operation, which I checked on 3 types of disks:

  • HDD (5400 rpm) (read / write: sequential (MB/s): 76/70, access time (ms): 18.946ms / 5.210ms)
  • SSD I (read / write: sequential (MB/s): 482/486, access time (ms): 0.158ms / 0.043ms)
  • SSD II (read / write: sequential (MB/s): 2400/1020, access time (ms): 0.066ms / 0.037ms)

HDD

SSD I

SSD II

Number of inquiries

3001

Page load time (s)

64.87

63.49

63.18

Time of query execution (s)

0.32

0.30

0.29

I used the Django Debug Toolbar for time measurement

Using select_related

Probably many people's intuition tells them that something is wrong here; modern computers can't perform such simple operations for several dozen seconds. This is, in fact, the case; it's a perfect example of using the select_related function. Let's make a query to the database again, this time using the mentioned function

houses = House.objects.select_related('roof', 'heating_type', 'garage').filter(param=value)

It turns out that the number of queries to the database has decreased to ... one, which looks like this:

SELECT ••• FROM "app_name_house" INNER JOIN "app_name_furniture" ON
("app_name_house"."roof_id" = "app_name_roof"."id")
INNER JOIN "app_test_heatingtype" ON
("app_name_house"."heating_type_id" = "app_name_heatingtype"."id")
INNER JOIN "app_name_room" ON
("app_name_house"."garage_id" = "app_name_garage"."id")
WHERE "app_name_house"."param" = 'value'

The page loading time has also decreased significantly:

 

HDD

SSD I

SSD II

Number of inquiries

1

Page load time (s)

0.00248

0.00264

0.00251

Time of query execution (s)

0.001

0.001

0.001

How not to use select_related

Looking at the table above, you may be tempted to use the select_related function always and everywhere or, for example, create a list of ForeignKey fields of a given object and pass it as an argument to the function. Unfortunately, it is not that simple and will likely produce the opposite effect to what is expected. It may turn out that despite the same number of inquiries, we will extract a lot of data that we do not need now, which will eventually slow down the page. To illustrate this phenomenon, I decided to create 10,000 House model objects, add a text box to the 10 models referenced by this model, then fill it with one page of "Lorem ipsum" and pull it out with one query. The differences in the operation time are presented in the table below.

Number of inqueries

1

Number of retrieved objects

10000

The number of ForeignKey fields

13

Using fields of ForeignKey

0

3

The fields specified in the select_related function

0

3

8

14

3

8

13

Loading time after filling in the text fields (s)

0.42

1.17

2.41

3.05

1.66

2.67

3.33

What is worth remembering?

  • We use the select_related function only when we perform operations on ForeignKey fields.
  • As an argument of the function, we give only those fields on which we plan to perform any operations.
  • Skillful use of select_related can significantly reduce the page loading time ...
  • … and improper use will significantly extend it. :)

Photo by Kevin Ku from Pexels