除了因为抽样导致统计基数不准外,MVCC也会导致基数统计不准确。例如:事务A先事务B开启且未提交,事务B删除部分数据,在可重复读中事务A还可以查询到删除的数据,此部分数据目前至少有两个版本,有一个标识为deleted的数据。
主键是直接按照表的行数来估计的,表的行数,优化器直接使用show table status like 't'的值
手动触发索引统计:
-- 重新统计索引信息 mysql> analyze table t;排序对索引选择的影响
-- 创建表 mysql> CREATE TABLE `t` ( `id` int(11) NOT NULL, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`) ) ENGINE=InnoDB; -- 定义测试数据存储过程 mysql> delimiter ; CREATE PROCEDURE idata () BEGIN DECLARE i INT ; SET i = 1 ; WHILE (i <= 100000) DO INSERT INTO t VALUES (i, i, i) ; SET i = i + 1 ; END WHILE ; END; delimiter ; -- 执行存储过程,插入测试数据 mysql> CALL idata (); -- 查看执行计划,使用了字段a上的索引 mysql> explain select * from t where a between 10000 and 20000; +----+-------------+-------+-------+---------------+-----+---------+------+-------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+---------+------+-------+-----------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 10000 | Using index condition | +----+-------------+-------+-------+---------------+-----+---------+------+-------+-----------------------+ -- 由于需要进行字段b排序,虽然索引b需要扫描更多的行数,但本身是有序的,综合扫描行数和排序,优化器选择了索引b,认为代价更小 mysql> explain select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1; +----+-------------+-------+-------+---------------+-----+---------+------+-------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+---------+------+-------+------------------------------------+ | 1 | SIMPLE | t | range | a,b | b | 5 | NULL | 50128 | Using index condition; Using where | +----+-------------+-------+-------+---------------+-----+---------+------+-------+------------------------------------+ -- 方案1:通过force index强制走索引a,纠正优化器错误的选择,不建议使用(不通用,且索引名称更变语句也需要变) mysql> explain select * from t force index(a) where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1; +----+-------------+-------+-------+---------------+-----+---------+------+------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+---------+------+------+----------------------------------------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 999 | Using index condition; Using where; Using filesort | +----+-------------+-------+-------+---------------+-----+---------+------+------+----------------------------------------------------+ -- 方案2:引导 MySQL 使用我们期望的索引,按b,a排序,优化器需要考虑a排序的代价 mysql> explain select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b,a limit 1; +----+-------------+-------+-------+---------------+-----+---------+------+------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+---------+------+------+----------------------------------------------------+ | 1 | SIMPLE | t | range | a,b | a | 5 | NULL | 999 | Using index condition; Using where; Using filesort | +----+-------------+-------+-------+---------------+-----+---------+------+------+----------------------------------------------------+ -- 方案3:有些场景下,我们可以新建一个更合适的索引,来提供给优化器做选择,或删掉误用的索引 ALTER TABLE `t` DROP INDEX `a`, DROP INDEX `b`, ADD INDEX `ab` (`a`,`b`) ; 索引优化 索引选择性索引选择性 = 基数 / 总行数
-- 表t中字段xxx的索引选择性 select count(distinct xxx)/count(id) from t;索引的选择性,指的是不重复的索引值(基数)和表记录数的比值。选择性是索引筛选能力的一个指标,索引的取值范围是 0~1 ,当选择性越大,索引价值也就越大。