6、通过代码我们可以看到drop table调用row_drop_table_for_mysql函数时,在row_mysql_lock_data_dictionary(trx);位置对元数据加了排它锁
const char* name,
trx_t* trx,
bool drop_db,
bool nonatomic,
dict_table_t* handler)
{
dberr_t err;
dict_foreign_t* foreign;
dict_table_t* table = NULL;
char* filepath = NULL;
char* tablename = NULL;
bool locked_dictionary = false;
pars_info_t* info = NULL;
mem_heap_t* heap = NULL;
bool is_intrinsic_temp_table = false;
DBUG_ENTER("row_drop_table_for_mysql");
DBUG_PRINT("row_drop_table_for_mysql", ("table: '%s'", name));
ut_a(name != NULL);
/* Serialize data dictionary operations with dictionary mutex:
no deadlocks can occur then in these operations /
trx->op_info = "dropping table";
if (handler != NULL && dict_table_is_intrinsic(handler)) {
table = handler;
is_intrinsic_temp_table = true;
}
if (table == NULL) {
if (trx->dict_operation_lock_mode != RW_X_LATCH) {
/ Prevent foreign key checks etc. while we are
dropping the table */
row_mysql_lock_data_dictionary(trx);
locked_dictionary = true;
nonatomic = true;
}
7、以Main Thread为例,在调用srv_master_evict_from_table_cache函数获取X-lock on RW-latch时被阻塞
Make room in the table cache by evicting an unused table.
@return number of tables evicted. /
static
ulint
srv_master_evict_from_table_cache(
/==============================/
ulint pct_check) /!< in: max percent to check */
{
ulint n_tables_evicted = 0;
rw_lock_x_lock(dict_operation_lock);
dict_mutex_enter_for_mysql();
n_tables_evicted = dict_make_room_in_cache(
innobase_get_table_cache_size(), pct_check);
dict_mutex_exit_for_mysql();
rw_lock_x_unlock(dict_operation_lock);
return(n_tables_evicted);
}
8、查看dict_operation_lock的注释,create drop table操作时需要X锁,而一些其他后台线程,比如Main Thread检查dict cache时,也需要获取dict_operation_lock的X锁,因此被阻塞
table create, drop, etc. reserve this in X-mode; implicit or
backround operations purge, rollback, foreign key checks reserve this
in S-mode; we cannot trust that MySQL protects implicit or background
operations a table drop since MySQL does not know of them; therefore
we need this; NOTE: a transaction which reserves this must keep book
on the mode in trx_t::dict_operation_lock_mode /
rw_lock_t dict_operation_lock;
9、期间用户线程由于获取不到锁而处于挂起状态,当无法立刻获得锁时,会调用row_mysql_handle_errors将错误码传到上层进行处理
Handles user errors and lock waits detected by the database engine.
@return true if it was a lock wait and we should continue running the
query thread and in that case the thr is ALREADY in the running state. /
bool
row_mysql_handle_errors
下面是简化后的用户线程调用栈信息:
Thread 29 (Thread 0x7f5d001ef700 (LWP 8159)): #0 pthread_cond_wait@@GLIBC_2.3.2 #1 wait #2 os_event::wait_low #3 os_event_wait_low #4 lock_wait_suspend_thread #5 row_mysql_handle_errors #6 row_search_mvcc #7 ha_innobase::index_read #8 handler::ha_index_read_map #9 handler::read_range_first #10 handler::multi_range_read_next #11 QUICK_RANGE_SELECT::get_next #12 rr_quick #13 mysql_update #14 Sql_cmd_update::try_single_table_update #15 Sql_cmd_update::execute #16 mysql_execute_command #17 mysql_parse #18 dispatch_command #19 do_command #20 handle_connection #21 pfs_spawn_thread #22 start_thread #23 clone10、page_cleaner后台线程没有抓到明显的阻塞关系,只看到如下正常的调用栈信息
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1 os_event::timed_wait
#2 os_event::wait_time_low
#3 os_event_wait_time_low
#4 pc_sleep_if_needed
#5 buf_flush_page_cleaner_coordinator
#6 start_thread
#7 clone
Thread 54 (Thread 0x7f5c7f614700 (LWP 39288)):
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
#1 wait
#2 os_event::wait_low
#3 os_event_wait_low
#4 buf_flush_page_cleaner_worker
#5 start_thread
#6 clone 【结论&解决思路】
drop table引起的MySQL 短暂hang死的问题,是由于drop 一张使用AHI空间较大的表时,调用执行AHI的清理动作,会消耗较长时间,执行期间长时间持有dict_operation_lock的X锁,阻塞了其他后台线程和用户线程;
drop table执行结束锁释放,MySQL积压的用户线程集中运行,出现了并发线程和连接数瞬间上升的现象。
规避问题的方法,可以考虑在drop table前关闭AHI。