How It Works: Magento 2 Partial Index

How It Works: Magento 2 Partial Index

Anyone who has ever used Magento 1 or Magento 2 knows about such feature as index. It is pretty important and useful since everything that is displayed on the frontend is related to index. Indexing by itself is a pretty long process that is why partial index has been available almost since the first version of Magento. Partial index means that we index not the whole document but only the part that has been modified.

In Magento 2 EE there are available 2 indexing modes: Update on Save and Update by Schedule. You can configure each mode under Tools -> System -> Index Management. Each of them has its own advantages and disadvantages. The main advantage of the Update on Save mode is that it lets you index data upon saving the document. In other words, once you save a product in a category it becomes instantly available on the frontend with all the changes that you have applied. However, the main disadvantage is that it incredibly increases the time for each operation to complete. That is why, to speed up the process there has been introduced the Update by Schedule mode. This feature lets you index data in the background so there is no delay when you try to save any document. Everything is performed asynchronously. The only minus of this method is that it may really take a while until the cron job starts indexing.

In this article we are going to describe how partial index works. Let’s assume that our default mode is Update by Schedule, so, if not indicated otherwise, the below description will concern exactly this method.

There is no special place in the Magento code where you can find partial index entities values. The whole logic of partial index is performed in a database – via MySQL triggers if to be more specific. For instance, the catalog_product_entity table contains 3 triggers for the following events: AFTER INSERT, AFTER UPDATE, AFTER DELETE.

Let’s check the trigger for the AFTER INSERT event:

As seen from the above code the trigger creates a record in the *_cl tables about a new entity. Let’s check one of those *_cl tables. They are all identical, so, what is true for one of them is also true for the others. Each table contains 2 fields: version_id and entity_id. The version_id field indicates the current changes version number, while the entity_id field shows the ids of the entities which need to be indexed.

When indexing is started by the cron job, the version_id values in the *_cl tables are compared to the records from the mview_state table which contains the information about index versions and index status. On the code side the logic is controlled by the Mview module which is a part of the Magento framework.

The cron job calls the \Magento\Framework\Mview\View::update method that calls the required index. Let’s have a look at this method a bit closer:

For each index there is created a separate Mview class object that is responsible for index update.

Let’s see what the variables from the above code stand for:

  • $currentVersionId – indicates the current version of the version_id field in the *_cl table;
  • $lastVersionId – is the last version of the corresponding index, taken from the mview_statetable;
  • $ids – are the ids of the entities that need to be indexed.

After that, each index becomes partially re-indexed in its own way. Now let’s see if it is possible to add custom triggers to MySQL tables and perform custom partial indexing. Let’s check the \Magento\Framework\Mview\View::subscribe method:

The first line verifies whether partial indexing is enabled for the Mview index. Next, there is created a respective *_cl table. After that the cycle goes through the list of all subscribers and creates triggers in the MySQL table. To add a custom view that will track for changes in your Mview index tables you just need to create an mview.xml file in the module directory.

Let’s describe one by one what each argument is responsible for. As an example, let’s take a code sample from the module Catalog-Permission:

This code creates the catalog_category_product_cl table which is subscribed for changes in catalog_category_entity and catalog_category_entity_int tables’ data. The data from entity_column – entity_id will be transmitted to the *_cl table. This will result in the triggers of the following type:

For catalog_category_entity:

Re-indexing will be performed by the object of the class field (see the $action variable in the above code). But you might have noticed that we haven’t used the subscription_model field.  This attribute is recorded in the <table> field and it indicates the class which is responsible for creating triggers and their syntax. Certain triggers might not be as simple as described above. In such cases one needs to use a custom model that is inherited from the default one. Here is an example of such a model from Magento 2 EE:

As you see, the body of the trigger differs from the trigger which we created the first time.

From the above example you may see how simple and elegant a partial index is. You can redefine existing subscriptions without any problems (there were such cases with the EE version when the developers simply forgot about the Staging module) as well as define your own subscriptions to work with custom indexes.