[HDFS] IOException: Got error, status message , ack with firstBadLink as xxx.xx.xx.xx:50010

2016-08-25

关键日志:

1
2
3
4
5
6
2016-08-24T02:00:32.362 08:00] [ERROR] hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java 980) [Thread-0] : Failed to close inode 22083674
java.io.IOException: Got error, status message , ack with firstBadLink as xxx.xxx.xxx.xxx:50010
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1334)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
at org.apache.hadoop.hdfs.DFSOutputStream

原因当时DN超过了每个磁盘预留的限制:

1
[2016-08-24T02:00:31.494+08:00] [INFO] server.datanode.DataNode.writeBlock(DataXceiver.java 837) [DataXceiver for client DFSClient_NONMAPREDUCE_103164656_1 at /172.16.183.26:29414 [Receiving block BP-964157621-172.16.172.17-1470393241028:blk_1094132102_20397583]] : opWriteBlock BP-964157621-172.16.172.17-1470393241028:blk_1094132102_20397583 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=123444224 B) is less than the block size (=134217728 B).

原因是balance的速度不够迅速,有一种的解决方案,但没测过,想法是每个盘可写最多不超出128M(预留为100G),即第一个满了可以写,但不能超过一个block的大小,这样目测可以避免这样的问题。