Hbase 错误记录及修改方法

1. Hbase 在运行或者操作过程中经常发生各种各样的问题,大部分问题是可以通过修改配置文件来解决的,当然可以修改源代码。

当hbase的并发量上来的时候,经常会导致Hbase出现“ Too Many Open Files”(打开的文件过多)的问题,日志记录如下:

2012-06-01 16:05:22,776 INFO org.apache.Hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketException: 打开的文件过多
2012-06-01 16:05:22,776 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_3790131629645188816_18192

2012-06-01 16:13:01,966 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_-299035636445663861_7843 file=/hbase/SendReport/83908b7af3d5e3529e61b870a16f02dc/data/17703aa901934b39bd3b2e2d18c671b4.9a84770c805c78d2ff19ceff6fecb972
     at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
     at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
     at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
     at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1695)
     at java.io.DataInputStream.readBoolean(DataInputStream.java:242)
     at org.apache.hadoop.hbase.io.Reference.readFields(Reference.java:116)
     at org.apache.hadoop.hbase.io.Reference.read(Reference.java:149)
     at org.apache.hadoop.hbase.regionserver.StoreFile.<init>(StoreFile.java:216)
     at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:282)
     at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:221)
     at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2510)
     at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:449)
     at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3228)
     at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3176)
     at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:331)
     at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:107)
     at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
     at java.lang.Thread.run(Thread.java:722)

原因及修改方法:由于 Linux系统最大可打开文件数一般默认的参数值是1024,通过 ulimit -n 65535 可即时修改,但重启后就无效了。或者有如下修改方式:
有如下三种修改方式:

1.在/etc/rc.local 中增加一行 ulimit -SHn 65535
2.在/etc/profile 中增加一行 ulimit -SHn 65535
3.在/etc/security/limits.conf最后增加如下两行记录
* soft nofile 65535
* hard nofile 65535

          

     2. 发现HDFS写入过程中有两个超时设置: dfs.socket.timeout和 dfs.datanode.socket.write.timeout;有些地方以为只是需要修改后面 的dfs.datanode.socket.write.timeout项就可以,其实看报错是READ_TIMEOUT。对应在hbase中的默认值如下: 

  // Timeouts for communicating with DataNode for streaming writes/reads

  public static int READ_TIMEOUT = 60 * 1000;   //其实是超过了这个值

public static int READ_TIMEOUT_EXTENSION = 3 * 1000;

public static int WRITE_TIMEOUT = 8 * 60 * 1000;

  public static int WRITE_TIMEOUT_EXTENSION = 5 * 1000; //for write pipeline


日志:

  11/10/12 10:50:44 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block blk_8540857362443890085_4343699470java.net.SocketTimeoutException: 66000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.*.*.*:14707 remote=/*.*.*.24:80010] 

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/dd8898596a043e9a4343ef4862eb3236.html