Broken pipe from XXXX:9091

Hi!

We’re on Community Edition 4.5.1 and we’re getting {"msg":"write tcp xxxx:9101: broken pipe from xxxx:9101 - cause: write tcp xxxx:9101: broken pipe from xxxx:9101","code":5000} errors frequently from Couchbase. Not sure what’s going on?

From previous posts, it seems that this is caused by indexer crashing, however, we’re on light load here with ~60% max index RAM used, and indexer.log doesn’t seem to show any errors.

Query log has the following errors:

2018-01-30T12:04:43.118+09:00 [Error] [GsiScanClient:"xxxx:9101"] ScanAll(46975b54-93a2-428c-a1df-72b485d8fdb1) request transport failed `write tcp 100.74.26.71:9101: broken pipe`
2018-01-30T12:04:43.128+09:00 [Error] [GsiScanClient:"xxxx:9101"] req(46975b54-93a2-428c-a1df-72b485d8fdb1) connection "100.74.26.71:52032" closed `EOF`
2018-01-30T12:04:43.128+09:00 [Error] [GsiScanClient:"xxxx:9101"] ScanAll(46975b54-93a2-428c-a1df-72b485d8fdb1) response failed `EOF`
2018-01-30T12:04:43.128+09:00 [Error] [GsiScanClient:"xxxx:9101"] ScanAll(46975b54-93a2-428c-a1df-72b485d8fdb1) request transport failed `write tcp 100.74.26.71:9101: broken pipe`

Around this time, indexer.log is only showing:

2018-01-30T12:04:43.134+09:00 [Info] ForestDBSlice::Commit SliceId 0 IndexInstId 12450542224799567472 FlushTime 14.921µs CommitTime 351.966µs TotalFlushTime 1h19m50.070289861s TotalCommitTime 14m33.625755521s
2018-01-30T12:04:43.134+09:00 [Info] ForestDBSlice::OpenSnapshot SliceId 0 IndexInstId 12450542224799567472 Creating New Snapshot SnapshotInfo: seqnos: 23222027, 0, 10520 committed:true
2018-01-30T12:04:43.134+09:00 [Info] StorageMgr::handleCreateSnapshot Added New Snapshot Index: 12450542224799567472 PartitionId: 0 SliceId: 0 Crc64: 15723203040488526349 (SnapshotInfo: seqnos: 23222027, 0, 10520 committed:true) SnapCreateDur 11.023955ms SnapOpenDur 36.188µs
2018-01-30T12:04:43.333+09:00 [Warn] StorageMgr::handleCreateSnapshot Skipped Creating New Snapshot for Index 12450542224799567472 PartitionId 0 SliceId 0. No New Mutations. IsDirty false
2018-01-30T12:04:43.533+09:00 [Warn] StorageMgr::handleCreateSnapshot Skipped Creating New Snapshot for Index 12450542224799567472 PartitionId 0 SliceId 0. No New Mutations. IsDirty false
2018-01-30T12:04:43.733+09:00 [Warn] StorageMgr::handleCreateSnapshot Skipped Creating New Snapshot for Index 12450542224799567472 PartitionId 0 SliceId 0. No New Mutations. IsDirty false
2018-01-30T12:04:43.933+09:00 [Warn] StorageMgr::handleCreateSnapshot Skipped Creating New Snapshot for Index 12450542224799567472 PartitionId 0 SliceId 0. No New Mutations. IsDirty false

Anyone can help?

@xingjia.zhang, please check the log for earlier timestamps as well. If indexer crashes, the connection pool between query and indexer gets cleaned up lazily. So these errors due to stale connections, can appear some time after a crash as well.

Hi @deepkaran.salooja, thank you for replying!

We checked the indexer.log for the whole day but there is no error. The only warning we get is from StorageMgr::handleCreateSnapshot which I assume is not relevant in this case?

We’re getting the broken pipe error more frequently now (while load is quite light) but the only error we can find is from the query.log. Any other logs we can check?

Are you checking for “panic”/“sigabrt” etc in the indexer logs as well? The other possibility is if your network is dropping connections.

Just checked, there’s no panic/sigabrt in indexer logs either.
At the moment, we are suspecting network connection issues due to observed spikes. Thanks!