一个磁盘I/O故障导致的AlwaysOn FailOver 过程梳理和分析 (2)

AlwaysOn通过sp_server_diagnostics来检查可用性组的健康状况,不断地获得诊断信息。sp_server_diagnostics的评估结果会被用来和AlwaysOn可用性组的FailureConditionLevel设置相比较,来约定是否符合发生故障转移的条件。一旦条件满足,则可用性组就被切换到新的可用性副本上。

 

3.2HealthCheckTimeout

The HealthCheckTimeout setting is used to specify the length of time, in milliseconds, that the SQL Server resource DLL should wait for information returned by the sp_server_diagnostics stored procedure before reporting the AlwaysOn Failover Cluster Instance (FCI) as unresponsive. Changes that are made to the timeout settings are effective immediately and do not require a restart of the SQL Server resource.

The resource DLL determines the responsiveness of the SQL instance using a health check timeout. The HealthCheckTimeout property defines how long the resource DLL should wait for the sp_server_diagnostics stored procedure before it reports the SQL instance as unresponsive to the WSFC service.

The following items describe how this property affects timeout and repeat interval settings:

The resource DLL calls the sp_server_diagnostics stored procedure and sets the repeat interval to one-third of the HealthCheckTimeout setting.

If the sp_server_diagnostics stored procedure is slow or is not returning information, the resource DLL will wait for the interval specified by HealthCheckTimeout before it reports to the WSFC service that the SQL instance is unresponsive.

If the dedicated connection is lost, the resource DLL will retry the connection to the SQL instance for the interval specified by HealthCheckTimeout before it reports to the WSFC service that the SQL instance is unresponsive.

 

3.3FailureConditionLevel

The SQL Server Database Engine resource DLL determines whether the detected health status is a condition for failure using the FailureConditionLevel property. The FailureConditionLevel property defines which detected health statuses cause restarts or failovers.

Review sp_server_diagnostics (Transact-SQL) as this system stored procedure plays in important role in the failure condition levels.

Level

 

Condition

 

Description

 

0

 

No automatic failover or restart

 

Indicates that no failover or restart will be triggered automatically on any failure conditions. This level is for system maintenance purposes only.

 

1

 

Failover or restart on server down

 

Indicates that a server restart or failover will be triggered if the following condition is raised:

SQL Server service is down.

 

2

 

Failover or restart on server unresponsive

 

Indicates that a server restart or failover will be triggered if any of the following conditions are raised:

SQL Server service is down.

SQL Server instance is not responsive (Resource DLL cannot receive data from sp_server_diagnostics within the HealthCheckTimeout settings).

 

3

 

Failover or restart on critical server errors

 

Indicates that a server restart or failover will be triggered if any of the following conditions are raised:

SQL Server service is down.

SQL Server instance is not responsive (Resource DLL cannot receive data from sp_server_diagnostics within the HealthCheckTimeout settings).

System stored procedure sp_server_diagnostics returns ‘system error’.

 

4

 

Failover or restart on moderate server errors

 

Indicates that a server restart or failover will be triggered if any of the following conditions are raised:

SQL Server service is down.

SQL Server instance is not responsive (Resource DLL cannot receive data from sp_server_diagnostics within the HealthCheckTimeout settings).

System stored procedure sp_server_diagnostics returns ‘system error’.

System stored procedure sp_server_diagnostics returns ‘resource error’.

 

5

 

Failover or restart on any qualified failure conditions

 

Indicates that a server restart or failover will be triggered if any of the following conditions are raised:

SQL Server service is down.

SQL Server instance is not responsive (Resource DLL cannot receive data from sp_server_diagnostics within the HealthCheckTimeout settings).

System stored procedure sp_server_diagnostics returns ‘system error’.

System stored procedure sp_server_diagnostics returns ‘resource error’.

System stored procedure sp_server_diagnostics returns ‘query_processing error’.

 

 

3.4:通过SQL更改相关配置。

The following example sets the HealthCheckTimeout option to 15,000 milliseconds (15 seconds).

ALTER SERVER CONFIGURATION SET FAILOVER CLUSTER PROPERTY HealthCheckTimeout = 15000;

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zygsfg.html