聊聊Postgres中的IPC之SI Message Queue

日期：2021-05-29 栏目：程序人生浏览：次

在 PostgreSQL中，每一个进程都有属于自己的共享缓存(shared cache)。例如，同一个系统表在不同的进程中都有对应的Cache来缓存它的元组（对于RelCache来说缓存的是一个RelationData结构）。同一个系统表的元组可能同时被多个进程的Cache所缓存，当其中某个Cache中的一个元组被删除或更新时，需要通知其他进程对其Cache进行同步。在 PostgreSQL的实现中，会记录下已被删除的无效元组，并通过SI Message方式(即共享消息队列方式)在进程之间传递这一消息。收到无效消息的进程将同步地把无效元组(或RelationData结构)从自己的Cache中删除。

1.无效消息(Invalid Message)概述

当前系统支持传递6种无效消息：
第一种是使给定的catcache中的一个元组无效；
第二种是使给定的系统表的所有catcache结构全部失效；
第三种是使给定的逻辑表的Relcache中RelationData结构无效；
第四种是使给定的物理表的SMGR无效(表物理位置发生变化时，需要通知SMGR关闭表文件)；
第五种是使给定的数据库的mapped-relation失效；
第六种是使一个已保存的快照失效。

可以看出这六种消息对应的影响范围越来越大。

PostgreSQL使用以下所示的结构体来存储无效消息。

typedef union { int8 id; /* type field --- must be first */ SharedInvalCatcacheMsg cc; SharedInvalCatalogMsg cat; SharedInvalRelcacheMsg rc; SharedInvalSmgrMsg sm; SharedInvalRelmapMsg rm; SharedInvalSnapshotMsg sn; } SharedInvalidationMessage;

其中，id为：

0或正数表示一个CatCache元组;

-1表示整个CatCahe缓存;

-2表示RelCache;

-3表示SMGR;

-4表示mapped-relation mapping;

-5表示Snapshot

当id为0或正数时，它同时也表示产生该Invalid Message的CatCache的编号。

具体我们可以看注释：

src/include/storage/sinval.h * * invalidate a specific tuple in a specific catcache * * invalidate all catcache entries from a given system catalog * * invalidate a relcache entry for a specific logical relation * * invalidate an smgr cache entry for a specific physical relation * * invalidate the mapped-relation mapping for a given database * * invalidate any saved snapshot that might be used to scan a given relation

进程通过调用函数CachelnvalidateHeapTuple()对Invalid Message进行注册，主要包括以下几步：

1) 注册SysCache无效消息。

2) 如果是对pg_class系统表元组进行的更新/删除操作，其 relfilenode或 reltablespace可能发生变化，即该表物理位置发生变化，需要通知其他进程关闭相应的SMGR。这时首先设置relationid和databaseid，然后注册SMGR无效消息；否则转而执行步骤3。

3) 如果是对pg_attribute或者pg_index系统表元组进行的更新/删除操作，则设置relationid和 dalabaseid，否则返回。

4) 注册RelCache无效消息(如果有的话)。

5) 事务结束时注册mapped-relation mapping和snapshot无效消息(如果有的话)。

当一个元组被删除或者更新时，在同一个SQL命令的后续执行步骤中我们依然认为该元组是有效的，直到下一个命令开始或者亊务提交时改动才生效。在命令的边界，旧元组变为失效，同时新元组置为有效。因此当执行heap_delete或者heap_update时，不能简单地刷新Cache。而且，即使刷新了，也可能由于同一个命令中的请求把该元组再次加载到Cache中。

因此正确的方法是保持一个无效链表用于记录元组的delete/update操作。事务完成后，根据前述的无效链表中的信息广播该事务过程中产生的Invalid Message，其他进程通过SI Message队列读取Invalid Message对各自的Cache进行刷新。当子事务提交时，只需要将该事务产生的Invalid Message提交到父事务，最后由最上层的事务广播Invalid Message。

需要注意的是，若涉及对系统表结构的改变，还需要重新加载pg_internal.init文件，因为该文件记录了所有系统表的结构。

2.SI Message全景

以下是相关的函数，写在前面，先混个脸熟：

CreateSharedInvalidationState() /* Create and initialize the SI message buffer SharedInvalBackendInit() /* 每个backend初始化时要初始化在 SI message buffer 中的Per-backend invalidation state，procState[MaxBackends] CleanupInvalidationState() /*每个backend shutdown时在调用on_shmem_exit()函数清空对应的procState[i] SICleanupQueue() /* Remove messages that have been consumed by all active backends * Possible side effects of this routine include marking one or more * backends as "reset" in the array, and sending PROCSIG_CATCHUP_INTERRUPT * to some backend that seems to be getting too far behind. We signal at * most one backend at a time, for reasons explained at the top of the file. SendSharedInvalidMessages() /* Add shared-cache-invalidation message(s) to the global SI message queue.

那么整个SI Message队列工作的流程大致如下：

转载注明出处：https://www.heiqu.com/wpjgxx.html

聊聊Postgres中的IPC之SI Message Queue

相关推荐