最近网站总是出问题,因为play服务总是跑着跑着就死了,于是经理尝试把play跑在我的MySQL这两台服务器上(因为这两台服务器的资源很空闲),可是没想到才跑了半天,就把服务器的128G内存耗尽,服务器无法正常使用,输入任何命令都报错,无法分配内存,reboot都不可以,只能去机房强制关机了。
我这里一两台,主主复制的mysql:
192.10.0.143
192.10.0.144
通过keepalived映射出来了vip:192.10.0.145,目前vip在144上。
重启的是143.
启动之后,mysql服务成功开启了,可是主从状态报错,sql进程状态为NO,如下:
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.144
Master_User: info_syncer
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000823
Read_Master_Log_Pos: 60049919
Relay_Log_File: mysql-relay-bin.000047
Relay_Log_Pos: 268334387
Relay_Master_Log_File: mysql-bin.000822
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 268334100
Relay_Log_Space: 535066509
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1594
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
1 row in set (0.00 sec)
ERROR: No query specified
报错原因:
从上面红体字可以知道,由于从库的异常关机,导致接收的主库的二进制日志崩溃,进而导致从库的relay日志损坏,sql进程无法读取,导致从库的sql进程状态为:NO。
问题解决:
MariaDB [(none)]> stop slave ;
Query OK, 0 rows affected (0.00 sec)
解决方法一:
找到,第一行记录了当前正在执行的log-relay文件名
找到该文件的下一个文件
使用mysqlbinlog查看该文件,在#98这行有Rotate to log-bin.000004 pos: 4等信息,这就是目前slave停止的位置 ,或者
在slave上重新指定同步位置,重新执行:
change master to
master_host='1.1.1.1',
master_user='repl',
master_password='111111',
master_port=3306,
master_log_file='log-bin.000004',
master_log_pos=4;
然后启动slave,
start slave ;
解决方法二:
stop slave之后,重新reset slave;
查看slave状态,正常了。。。。。
reset slave到底做了什么??
RESET SLAVE
官方的解释如下
1)RESET SLAVE makes the slave forget its replication position in the master's binary log. This statement is meant to be used for a clean start: It clears the master info and relay log info repositories, deletes all the relay log files, and starts a new relay log file. It also resets to 0 the replication delay specified with the MASTER_DELAY option to CHANGE MASTER TO. To use RESET SLAVE, the slave replication threads must be stopped (use STOP SLAVE if necessary).
2)RESET SLAVE does not change any replication connection parameters such as master host, master port, master user, or master password, which are retained in memory. This means that START SLAVE can be issued without requiring a CHANGE MASTER TO statement following
reset slave
其实,它是直接删除master.info和relay-log.info文件,并删除所有的relay log,然后重新生成一个新的relay log,即使relay log中还有SQL没有被SQL线程apply完。